How to sum up a column based on another columns value Python-CodePudding

I have this example df

col1 = [1,1,1,2,2,1,1,1,2,2,2]
col2 = [20, 23, 12, 44, 14, 42, 44, 1, 42, 62, 11]

data = {"col1": col1, "col2": col2}
df = pd.DataFrame(data)

I need to add a column that adds up the col2 every time the col1 is 1 and then the same for when it is 2. I have tried grouping by col1 but this skips every time there is a 2 in between The expected output would be this.

col1    col2  col3
1       20    55 
1       23    55
1       12    55
2       44    58
2       14    58
1       42    87
1       44    87
1       1     87
2       42    115
2       62    115
2       11    115

Please let me know how to approach this

CodePudding user response：

Use GroupBy.transform with helper Series for consecutive values generated by comapre shifted values for not equal and cumulative sum:

df['col3'] = df.groupby(df['col1'].ne(df['col1'].shift()).cumsum())['col2'].transform('sum')
print (df)
    col1  col2  col3
0      1    20    55
1      1    23    55
2      1    12    55
3      2    44    58
4      2    14    58
5      1    42    87
6      1    44    87
7      1     1    87
8      2    42   115
9      2    62   115
10     2    11   115

CodePudding user response：

You can do this by creating a column that will mark every time there is a change in col1 and then sum by groupby:

i = df.col1    
df['Var3'] = i.ne(i.shift()).cumsum()
df['sums'] = df.groupby(['Var3'])['col2'].transform('sum')

which gives

col1  col2  Var3  sums
0      1    20     1    55
1      1    23     1    55
2      1    12     1    55
3      2    44     2    58
4      2    14     2    58
5      1    42     3    87
6      1    44     3    87
7      1     1     3    87
8      2    42     4   115
9      2    62     4   115
10     2    11     4   115