Home > database >  Increment counter the first time a number is reached in a groupby
Increment counter the first time a number is reached in a groupby

Time:11-26

I found this very usefull post here and I'm trying to do same in a groupby df...

Here is the original post with an incremental counter every time df['step'] has a 6 : link

In my case I want to increment this counter everytime 1 occures

So I modified this request :

df['counter'] = ((df['step']==6) & (df.shift(1)['step']!=6 )).cumsum()

like this :

df['counter_2'] = ((df['counter1'] == 1) & (df.shift(1)['counter1'] != 1)).cumsum()

And now I try to make this in a grouping by ('prd_id') this counter_2 grouping by 'product_A', 'product_B', etc

CodePudding user response:

updated answer

df['counter'] = df['step'].eq(1).groupby(df['prd_id']).cumsum()

Output:

   prd_id  step  counter
0       A     1        1
1       A     2        1
2       A     3        1
3       A     4        1
4       A     1        2
5       A     2        2
6       B     1        1
7       B     1        2
8       B     2        2
9       B     1        3
10      B     2        3
11      B     3        3

original answer

You can use duplicated, the boolean NOT (~), and cumsum:

df['counter'] = (~df['step'].duplicated()).cumsum()

Output:

    step  counter
0      2        1
1      2        1
2      2        1
3      3        2
4      4        3
5      4        3
6      5        4
7      6        5
8      6        5
9      6        5
10     6        5
11     7        6
12     5        6  # not incrementing, 5 was seen above
13     6        6  # not incrementing, 6 was seen above
14     6        6
15     6        6
16     7        6  # not incrementing, 7 was seen above
17     5        6  # not incrementing, 5 was seen above
18     6        6  # not incrementing, 6 was seen above
19     7        6  # not incrementing, 7 was seen above
20     5        6  # not incrementing, 5 was seen above

If you also have groups, use:

df['counter'] = (~df[['step', 'group']].duplicated()).groupby(df['group']).cumsum()

Example:

   group  step  counter
0      A     1        1
1      A     2        2
2      A     2        2
3      A     3        3
4      A     2        3
5      A     4        4
6      B     1        1  # first time in B
7      B     1        1
8      B     2        2
9      B     1        2  # duplicated in B
10     B     2        2
11     B     3        3
  • Related