I have the following dataframe:
col1 col2 col3
0 prog1 id001 1
1 prog2 id001 0
2 prog2 id001 1
3 prog1 id002 0
4 prog3 id002 0
5 prog4 id002 1
6 prog2 id003 0
7 prog5 id003 1
8 prog5 id003 1
9 prog3 id004 1
Goal :
I want to add a new column calculating a sum as follows:
col1 col2 col3 sum_prog
0 prog1 id001 1 1
1 prog2 id001 0 1
2 prog2 id001 1 1
3 prog1 id002 0 1
4 prog3 id002 0 1
5 prog4 id002 1 1
6 prog2 id003 0 1
7 prog5 id003 1 2
8 prog5 id003 1 2
9 prog3 id004 1 1
As you can see id003 has sum_prog ==2 for prog5, because prog5 occured twice in the 'col1' and sum of corresponding rows in 'col3' = 2
I tryed to solve this by df.groupby()
but till now I didn't get the desired output.
CodePudding user response:
You can use .transform
to generate the pd.Series
directly.
Group on "col1"
and sum
the values in "col3"
.
df["sum_prog"] = df.groupby(["col1"]).col3.transform("sum")
Out
col1 col2 col3 sum_prog
0 prog1 id001 1 1
1 prog2 id001 0 1
2 prog2 id001 1 1
3 prog1 id002 0 1
4 prog3 id002 0 1
5 prog4 id002 1 1
6 prog2 id003 0 1
7 prog5 id003 1 2
8 prog5 id003 1 2
9 prog3 id004 1 1