Home > Enterprise >  Aggregate only one of the duplicated values with groupby pandas
Aggregate only one of the duplicated values with groupby pandas

Time:04-05

I have the following data with the last column as the desired output:

activity teacher group students the desired column
One A a 3 5
One B b 2 5
two A c 7 7
One D a 3 5
two C c 7 7

I want to group by the activity and returen the number of students without dublicate the student when we have more than one teacher for them. I tried the following but it's dublicate the sum with for the same group.

df.groupby('activity').students.transform('sum')

the output of this looks like:

activity teacher group students the output column
One A a 3 8
One B b 2 8
two A c 7 14
One A a 3 8
two C c 7 14

thank you in advance for any suggestion.

CodePudding user response:

IIUC:

x = (
    df.drop_duplicates(subset=["activity", "group"])
    .groupby("activity")["students"]
    .sum()
)
df["the desired column"] = df["activity"].map(x)
print(df)

Prints:

  activity teacher group  students  the desired column
0      One       A     a         3                   5
1      One       B     b         2                   5
2      two       A     c         7                   7
3      One       D     a         3                   5
4      two       C     c         7                   7
  • Related