Home > Mobile >  How to sum up a column based on another columns value Python
How to sum up a column based on another columns value Python

Time:06-01

I have this example df

col1 = [1,1,1,2,2,1,1,1,2,2,2]
col2 = [20, 23, 12, 44, 14, 42, 44, 1, 42, 62, 11]

data = {"col1": col1, "col2": col2}
df = pd.DataFrame(data)

I need to add a column that adds up the col2 every time the col1 is 1 and then the same for when it is 2. I have tried grouping by col1 but this skips every time there is a 2 in between The expected output would be this.

col1    col2  col3
1       20    55 
1       23    55
1       12    55
2       44    58
2       14    58
1       42    87
1       44    87
1       1     87
2       42    115
2       62    115
2       11    115

Please let me know how to approach this

CodePudding user response:

Use GroupBy.transform with helper Series for consecutive values generated by comapre shifted values for not equal and cumulative sum:

df['col3'] = df.groupby(df['col1'].ne(df['col1'].shift()).cumsum())['col2'].transform('sum')
print (df)
    col1  col2  col3
0      1    20    55
1      1    23    55
2      1    12    55
3      2    44    58
4      2    14    58
5      1    42    87
6      1    44    87
7      1     1    87
8      2    42   115
9      2    62   115
10     2    11   115

CodePudding user response:

You can do this by creating a column that will mark every time there is a change in col1 and then sum by groupby:

i = df.col1    
df['Var3'] = i.ne(i.shift()).cumsum()
df['sums'] = df.groupby(['Var3'])['col2'].transform('sum')

which gives

col1  col2  Var3  sums
0      1    20     1    55
1      1    23     1    55
2      1    12     1    55
3      2    44     2    58
4      2    14     2    58
5      1    42     3    87
6      1    44     3    87
7      1     1     3    87
8      2    42     4   115
9      2    62     4   115
10     2    11     4   115
​
  • Related