Home > Mobile >  Create a custom percentile rank for a pandas series
Create a custom percentile rank for a pandas series

Time:02-13

I need to calculate the percentile using a specific algorithm that is not available using either pandas.rank() or numpy.rank().

The ranking algorithm is calculated as follows for a series:

rank[i] = (# of values in series less than i # of values equal to i*0.5)/total # of values

so if I had the following series

s=pd.Series(data=[5,3,8,1,9,4,14,12,6,1,1,4,15])
  • For the first element, 5 there are 6 values less than 5 and no other values = to 5. The rank would be (6 0x0.5)/13 or 6/13.

  • For the fourth element (1) it would be (0 2x0.5)/13 or 1/13.

How could I calculate this without using a loop? I assume a combination of s.apply and/or s.where() but can't figure it out and have tried searching. I am looking to apply to the entire series at once, with the result being a series with the percentile ranks.

CodePudding user response:

You could use numpy broadcasting. First convert s to a numpy column array. Then use numpy broadcasting to count the number of items less than i for each i. Then count the number of items equal to i for each i (note that we need to subract 1 since, i is equal to i itself). Finally add them and build a Series:

tmp = s.to_numpy()
s_col = tmp[:, None]
less_than_i_count = (s_col>tmp).sum(axis=1)
eq_to_i_count = ((s_col==tmp).sum(axis=1) - 1) * 0.5
ranks = pd.Series((less_than_i_count   eq_to_i_count) / len(s), index=s.index) 

Output:

0     0.461538
1     0.230769
2     0.615385
3     0.076923
4     0.692308
5     0.346154
6     0.846154
7     0.769231
8     0.538462
9     0.076923
10    0.076923
11    0.346154
12    0.923077
dtype: float64
  • Related