Home > OS >  how to find numbers of row above mean in pandas.dataframe?
how to find numbers of row above mean in pandas.dataframe?

Time:12-01

and here i am stuck at a question about finding how many number of rows above average/mean score.

my df like this:

   Subject   Name    Score
0  s1        Amy     100
1  s1        Bob     90
2  s1        Cathy   92
3  s1        David   88
4  s2        Emma    95
5  s2        Frank   80
6  s2        Gina    86
7  s2        Helen   89 
...

I can get mean of each subject, by using df.groupby('Subject').Score.mean()
But I don't know how to find how many students have score more than average in each subject.
(I guess I can use for loop to calculate the count. But I want to know if there is a way in pandas to do it. )

It would be great if anyone can help. Thank you.

CodePudding user response:

You can try using groupby and apply:

def count_above_avg(g):
    avg = g.Score.mean()
    return (g.Score > avg).sum()

df.groupby('Subject').apply(count_above_avg)

CodePudding user response:

use .transform which lets you apply group by operations without transforming the current index.

df['is_above_subject_avg'] = (
     df['Score'] >= df.groupby('Subject')['Score'].transform('mean')
   )

df.groupby('Subject')['is_above_subject_avg'].sum()

Subject
s1    1
s2    2
Name: is_above_subject_avg, dtype: int64
  • Related