How I summed all the values in a particular column up to the current line-CodePudding

I have df with binary column (my_column) and an id columns. I want to create a new column that for each row sums all the values in the binary column which belong to the same id up to his row but does not include the row itself.

Sample Input:

    id  my_column
0   111 0
1   111 1
2   111 1
3   111 0
4   222 1
5   222 1
6   222 1
7   222 1

Sample output:

    id  my_column   new_column
0   111 0           0
1   111 1           0
2   111 1           1
3   111 0           2
4   222 1           0
5   222 1           1
6   222 1           2
7   222 1           3

CodePudding user response：

You can use transform with cumsum:

df['new_column'] = df.groupby('id').my_column.transform('cumsum') - df.my_column

CodePudding user response：

Using loops (non-vectorized code which should be slower as compared to more elegant answer by @stef)

import pandas as pd

my_dict = {
    'id':[111,111,111,111,222,222,222,222, 111],
      'my_column':[0,1,1,0,1,1,1,1,1]
}
df = pd.DataFrame(my_dict)        
print (df)

new_column = []
for i in range(len(df)):
  adf = df[:i]
  adf = adf[adf['id'] == df.iloc[i]['id']]  
  if len(adf) > 0:
    new_column.append(sum(adf['my_column']))    
  else:
    new_column.append(0)

df['new_column'] = new_column
print (df)

Output:

    id  my_column
0  111          0
1  111          1
2  111          1
3  111          0
4  222          1
5  222          1
6  222          1
7  222          1
8  111          1

    id  my_column  new_column
0  111          0           0
1  111          1           0
2  111          1           1
3  111          0           2
4  222          1           0
5  222          1           1
6  222          1           2
7  222          1           3
8  111          1           2