Home > other >  how to calculate percentile value of number in dataframe column grouped by index
how to calculate percentile value of number in dataframe column grouped by index

Time:09-28

I have a dataframe like this:

df:
        Score
group
  A      100
  A      34
  A      40
  A      30
  C      24
  C      60
  C      35

For every group in the data, I want to find out the percentile value of Score 35. (i.e the percentile where the 35 fits in the grouped data)

I tried different tricks but none of them worked.

scipy.stats.percentileofscore(df['Score], 35, kind='weak')
 --> This is working but this doesn't give me the percentile grouped by index

df.groupby('group')['Score].percentileofscore()
 --> 'SeriesGroupBy' object has no attribute 'percentileofscore'

scipy.stats.percentileofscore(df.groupby('group')[['Score]], 35, kind='strict')
 --> TypeError: '<' not supported between instances of 'str' and 'int'

My ideal output looks like this:

df:
        Score Percentile
group 
  A       50
  C       33

Can anyone suggest to me what works well here?

CodePudding user response:

Inverse quantile function for a sequence at point X is the proportion of values less than X in the sequence, right? So:

In [158]: df["Score"].lt(35).groupby(df["group"]).mean().mul(100)
Out[158]:
group
A    50.000000
C    33.333333
Name: Score, dtype: float64
  • get a True/False Series of whether < 35 or not on "Score"
  • group this Series over "group"
  • take the mean
    • since True == 1 and False == 0, it will effectively give the proportion!
  • multiply by 100 to get percentages

CodePudding user response:

Assuming you want to calculate the top percentile, i.e. how much of the scores lie above 35, you need to use 100 - percentileofscore inside agg:

df.groupby('group').agg(Score_Percentile=('Score',\
                    lambda x: 100 - percentileofscore(x, 35, kind='weak'))).\
                    astype(int)

Output:

       Score_Percentile
group   
A      50
C      33

CodePudding user response:

To answer in a bit more general purpose way you're looking to do a custom aggregation on the group, which pandas lets you do with the agg method.

You can define the function yourself or use one from a library:

def percentileofscore(ser: pd.Series) -> float:
    return 100 * (ser > 35).sum() / ser.size

df.groupby("group").agg(percentileofscore)

Output:

      Score
group   
A     50.000000
C     33.333333
  • Related