I have the following data with the last column as the desired output:
activity | teacher | group | students | the desired column |
---|---|---|---|---|
One | A | a | 3 | 5 |
One | B | b | 2 | 5 |
two | A | c | 7 | 7 |
One | D | a | 3 | 5 |
two | C | c | 7 | 7 |
I want to group by the activity and returen the number of students without dublicate the student when we have more than one teacher for them. I tried the following but it's dublicate the sum with for the same group.
df.groupby('activity').students.transform('sum')
the output of this looks like:
activity | teacher | group | students | the output column |
---|---|---|---|---|
One | A | a | 3 | 8 |
One | B | b | 2 | 8 |
two | A | c | 7 | 14 |
One | A | a | 3 | 8 |
two | C | c | 7 | 14 |
thank you in advance for any suggestion.
CodePudding user response:
IIUC:
x = (
df.drop_duplicates(subset=["activity", "group"])
.groupby("activity")["students"]
.sum()
)
df["the desired column"] = df["activity"].map(x)
print(df)
Prints:
activity teacher group students the desired column
0 One A a 3 5
1 One B b 2 5
2 two A c 7 7
3 One D a 3 5
4 two C c 7 7