In my last blog on rank functions in Tableau, I found out that the chart of rank() and rank_percentile() can exactly overlap each other, given the same scale. But they have totally different notations and usage.
Rank() is typically used in competitive events. So it has got another name called competitive rank. However, when you tell people you ranked 3rd, we just know how far you are from the top. We have no idea how many are under you.
Here comes the Rank_Percentile(). When you tell you are at 75%, we know 25% people are above you and 75% are no higher than you. So percentile rank gives us a sense of aggregation and statistics over a group, although the group size is not specified.
The common property of Rank() and Rank_Percentile() is that they are both biased towards top. It means that when there are multiple data of the same value, they will all be ranked as the highest possible position.
When given the size of the dataset in question: N, we can calculate the percentile rank from the ranks, and vice versa, calculate rank from the percentiles. Below is the formula I figured out, where 20 is the total number of clubs in Premier League in our example:
Rank = 21-20*Percentile
and
Percentile = (21-Rank)/20
We see that Rank and Percentile have a cordial linear relationship.
In general, the formula will be:
Rank = N+1-N*Percentile = 1+N*(1-Percentile)
and
Percentile = (N+1-Rank)/N = 1+(1-Rank)/N
where N is the size of the dataset.
The conclusion is simple: given N, we can easily calculate one from the other. By having both, we can have better understanding of the dataset.
How percentile is working?
ReplyDeleteAs per your formula using in sample super store dataset showing different values.