Pandas groupby results assign to a new column-CodePudding

Hi I'm trying to create a new column in my dataframe and I want the values to based on a calc. The calc is - score share of Student within the Class. There are 2 different students with the same name in different classes, hence why the first group by below is on Class and Student both.

df['share'] = df.groupby(['Class', 'Student'])['Score'].agg('sum')/df.groupby(['Class'])['Score'].agg('sum')

With the code above, I get the error incompatible index of inserted column with frame index.

Can someone please help. Thanks

CodePudding user response：

the problem is the groupby aggregate and the index are the unique values of the column you group. And in your case, the SHARE score is the class's score and not the student's, and this sets up a new dataframe with each student's share score. I understood your problem this way.

ndf = df.groupby(['Class', 'Student'])['Score'].agg('sum')/df.groupby(['Class'])['Score'].agg('sum')
ndf = ndf.reset_index()
ndf

CodePudding user response：

If I understood you correctly, given an example df like the following:

    Class Student  Score        
1       1       1     99
2       1       2     60
3       1       3     90
4       1       4     50
5       2       1     93
6       2       2     93
7       2       3     67
8       2       4     58
9       3       1     54
10      3       2     29
11      3       3     34
12      3       4     46

Do you need the following result?

    Class Student  Score  Score_Share
1       1       1     99     0.331104
2       1       2     60     0.200669
3       1       3     90     0.301003
4       1       4     50     0.167224
5       2       1     93     0.299035
6       2       2     93     0.299035
7       2       3     67     0.215434
8       2       4     58     0.186495
9       3       1     54     0.331288
10      3       2     29     0.177914
11      3       3     34     0.208589
12      3       4     46     0.282209

If so, that can be achieved straight forward with:

    df['Score_Share'] = df.groupby('Class')['Score'].apply(lambda x: x / x.sum())

You can apply operations within each group's scope like that.

PS. I don't know why a student with the same name in a different class would be a problem, so maybe I'm not getting something right. I'll edit this according to your response. Can't make a comment because I'm a newbie here :)