I have a dataframe
key1 | key2 | key3 | value1 | value2 |
---|---|---|---|---|
1 | a | s2 | 3 | 4 |
1 | a | s2 | 2 | 3 |
2 | b | j6 | 1 | 1 |
and I want as result
key1 | key2 | key3 | value1 | value2 | sum_value1 | sum_value2 |
---|---|---|---|---|---|---|
1 | a | s2 | 3 | 4 | 5 | 7 |
1 | a | s2 | 2 | 3 | 5 | 7 |
2 | b | j6 | 1 | 1 | 1 | 1 |
sum_value1 is the summation of values in value1 by grouping key1, key2, key3. And so for sum_value2.
How can I get this? Thank you!
What I used so far:
df["sum_value1"] = df["value1"].groupby(["key1","key2","key3"]).transform('sum')
CodePudding user response:
use groupby and transform to return the sum of individual columns
df[['sum_value1','sum_value2']]=df.groupby(['key1','key2','key3'])[['value1','value2']].transform(sum)
df
key1 key2 key3 value1 value2 sum_value1 sum_value2
0 1 a s2 3 4 5 5
1 1 a s2 2 1 5 5
2 2 b j6 1 1 1 1