and here i am stuck at a question about finding how many number of rows above average/mean score.
my df like this:
Subject Name Score
0 s1 Amy 100
1 s1 Bob 90
2 s1 Cathy 92
3 s1 David 88
4 s2 Emma 95
5 s2 Frank 80
6 s2 Gina 86
7 s2 Helen 89
...
I can get mean of each subject, by using df.groupby('Subject').Score.mean()
But I don't know how to find how many students have score more than average in each subject.
(I guess I can use for loop to calculate the count. But I want to know if there is a way in pandas to do it. )
It would be great if anyone can help. Thank you.
CodePudding user response:
You can try using groupby
and apply
:
def count_above_avg(g):
avg = g.Score.mean()
return (g.Score > avg).sum()
df.groupby('Subject').apply(count_above_avg)
CodePudding user response:
use .transform
which lets you apply group by operations without transforming the current index.
df['is_above_subject_avg'] = (
df['Score'] >= df.groupby('Subject')['Score'].transform('mean')
)
df.groupby('Subject')['is_above_subject_avg'].sum()
Subject
s1 1
s2 2
Name: is_above_subject_avg, dtype: int64