This is a bit hard to describe, but I have the following table:
import pandas as pd
my_dict = {'user_id': ['a', 'a', 'a', 'b', 'a', 'c', 'd', 'a', 'e', 'c', 'a'],
'product': ['x', 'y', 'z', 'x', 'x', 't', 'x', 'y', 'x', 'z', 'x'],
'viewed': [True, False, True, True, False, False, True, True, False, False, False]}
df = pd.DataFrame(my_dict)
df
and I would like to add a 'count' column that will add how many times a user_id/product pair has occurred above with the viewed==True.
I tried the following:
df['count'] = df.groupby(['user_id', 'product', 'viewed']).cumcount()
df
but I get zeros everywhere except the last row: output table
What I need for the outcome is to also have a 1 in 'count' of index 4: expected output table
Please help.
CodePudding user response:
IIUC, you could compute the cumsum
per group and subtract the True (which equals to 1 due to the True/1 equivalence) to only have the increment after the True:
df['count'] = df.groupby(['user_id', 'product'])['viewed'].cumsum()-df['viewed']
output:
user_id product viewed count
0 a x True 0
1 a y False 0
2 a z True 0
3 b x True 0
4 a x False 1
5 c t False 0
6 d x True 0
7 a y True 0
8 e x False 0
9 c z False 0
10 a x False 1
CodePudding user response:
If need count True
s in next rows use:
df['count'] = (df.groupby(['user_id', 'product'])['viewed']
.transform(lambda x: x.shift(fill_value=0).cumsum()))
print (df)
user_id product viewed count
0 a x True 0
1 a y False 0
2 a z True 0
3 b x True 0
4 a x False 1
5 c t False 0
6 d x True 0
7 a y True 0
8 e x False 0
9 c z False 0
10 a x False 1