I have df with binary column (my_column
) and an id
columns.
I want to create a new column that for each row sums all the values in the binary column which belong to the same id up to his row but does not include the row itself.
Sample Input:
id my_column
0 111 0
1 111 1
2 111 1
3 111 0
4 222 1
5 222 1
6 222 1
7 222 1
Sample output:
id my_column new_column
0 111 0 0
1 111 1 0
2 111 1 1
3 111 0 2
4 222 1 0
5 222 1 1
6 222 1 2
7 222 1 3
CodePudding user response:
You can use transform
with cumsum
:
df['new_column'] = df.groupby('id').my_column.transform('cumsum') - df.my_column
CodePudding user response:
Using loops (non-vectorized code which should be slower as compared to more elegant answer by @stef)
import pandas as pd
my_dict = {
'id':[111,111,111,111,222,222,222,222, 111],
'my_column':[0,1,1,0,1,1,1,1,1]
}
df = pd.DataFrame(my_dict)
print (df)
new_column = []
for i in range(len(df)):
adf = df[:i]
adf = adf[adf['id'] == df.iloc[i]['id']]
if len(adf) > 0:
new_column.append(sum(adf['my_column']))
else:
new_column.append(0)
df['new_column'] = new_column
print (df)
Output:
id my_column
0 111 0
1 111 1
2 111 1
3 111 0
4 222 1
5 222 1
6 222 1
7 222 1
8 111 1
id my_column new_column
0 111 0 0
1 111 1 0
2 111 1 1
3 111 0 2
4 222 1 0
5 222 1 1
6 222 1 2
7 222 1 3
8 111 1 2