Home > OS >  Python Pandas - Count consecutive grouped rows that satisfy condition and reset counter when false
Python Pandas - Count consecutive grouped rows that satisfy condition and reset counter when false

Time:04-12

I need to add a new column labelled "counter" to the existing dataframe that will be calculated as shows in the example below:

symbol percentage ??? counter ???
A 11 -1
A 2 0
A 5 1
B 2 0
B 1 1
B 3 2
A 2 2
A 9 -1
A 4 0
B 2 3
B 8 -1
B 7 -1

So the data is grouped per "symbol" and the logic for calculating the "counter" is like this:

  • if the "percentage" is greater than 5, then "counter" is equal to -1
  • if the "percentage" is less than 5, then we start counter with 0, if the next row for the same symbol is again less than 5, we increase the counter
  • if the next row "percentage" is again greater than 5, we break the counting and set the "counter" column again to -1

I've tried something like this, but it's not good, since the reset is not working:

df['counter'] = np.where(df['percentage'] > 5, -1, df.groupby('symbol').cumcount())

CodePudding user response:

IIUC, you can use a mask and a custom groupby:

m = df['percentage'].gt(5)
group = m.groupby(df['symbol']).apply(lambda s: s.ne(s.shift()).cumsum())

df['count'] = (df
               .groupby(['symbol', group])
               .cumcount()
               .mask(m, -1)
               )

Output:

   symbol  percentage  counter
0       A          11       -1
1       A           2        0
2       A           5        1
3       B           2        0
4       B           1        1
5       B           3        2
6       A           2        2
7       A           9       -1
8       A           4        0
9       B           2        3
10      B           8       -1
11      B           7       -1
  • Related