Aggregate only one of the duplicated values with groupby pandas-CodePudding

I have the following data with the last column as the desired output:

activity	teacher	group	students	the desired column
One	A	a	3	5
One	B	b	2	5
two	A	c	7	7
One	D	a	3	5
two	C	c	7	7

I want to group by the activity and returen the number of students without dublicate the student when we have more than one teacher for them. I tried the following but it's dublicate the sum with for the same group.

df.groupby('activity').students.transform('sum')

the output of this looks like:

activity	teacher	group	students	the output column
One	A	a	3	8
One	B	b	2	8
two	A	c	7	14
One	A	a	3	8
two	C	c	7	14

thank you in advance for any suggestion.

CodePudding user response：

IIUC:

x = (
    df.drop_duplicates(subset=["activity", "group"])
    .groupby("activity")["students"]
    .sum()
)
df["the desired column"] = df["activity"].map(x)
print(df)

Prints:

  activity teacher group  students  the desired column
0      One       A     a         3                   5
1      One       B     b         2                   5
2      two       A     c         7                   7
3      One       D     a         3                   5
4      two       C     c         7                   7