I found this very usefull post here and I'm trying to do same in a groupby df...
Here is the original post with an incremental counter every time df['step'] has a 6 : link
In my case I want to increment this counter everytime 1 occures
So I modified this request :
df['counter'] = ((df['step']==6) & (df.shift(1)['step']!=6 )).cumsum()
like this :
df['counter_2'] = ((df['counter1'] == 1) & (df.shift(1)['counter1'] != 1)).cumsum()
And now I try to make this in a grouping by ('prd_id') this counter_2 grouping by 'product_A', 'product_B', etc
CodePudding user response:
updated answer
df['counter'] = df['step'].eq(1).groupby(df['prd_id']).cumsum()
Output:
prd_id step counter
0 A 1 1
1 A 2 1
2 A 3 1
3 A 4 1
4 A 1 2
5 A 2 2
6 B 1 1
7 B 1 2
8 B 2 2
9 B 1 3
10 B 2 3
11 B 3 3
original answer
You can use duplicated
, the boolean NOT (~
), and cumsum
:
df['counter'] = (~df['step'].duplicated()).cumsum()
Output:
step counter
0 2 1
1 2 1
2 2 1
3 3 2
4 4 3
5 4 3
6 5 4
7 6 5
8 6 5
9 6 5
10 6 5
11 7 6
12 5 6 # not incrementing, 5 was seen above
13 6 6 # not incrementing, 6 was seen above
14 6 6
15 6 6
16 7 6 # not incrementing, 7 was seen above
17 5 6 # not incrementing, 5 was seen above
18 6 6 # not incrementing, 6 was seen above
19 7 6 # not incrementing, 7 was seen above
20 5 6 # not incrementing, 5 was seen above
If you also have groups, use:
df['counter'] = (~df[['step', 'group']].duplicated()).groupby(df['group']).cumsum()
Example:
group step counter
0 A 1 1
1 A 2 2
2 A 2 2
3 A 3 3
4 A 2 3
5 A 4 4
6 B 1 1 # first time in B
7 B 1 1
8 B 2 2
9 B 1 2 # duplicated in B
10 B 2 2
11 B 3 3