Python Pandas - Count consecutive grouped rows that satisfy condition and reset counter when false-CodePudding

I need to add a new column labelled "counter" to the existing dataframe that will be calculated as shows in the example below:

symbol	percentage	??? counter ???
A	11	-1
A	2	0
A	5	1
B	2	0
B	1	1
B	3	2
A	2	2
A	9	-1
A	4	0
B	2	3
B	8	-1
B	7	-1

So the data is grouped per "symbol" and the logic for calculating the "counter" is like this:

if the "percentage" is greater than 5, then "counter" is equal to -1
if the "percentage" is less than 5, then we start counter with 0, if the next row for the same symbol is again less than 5, we increase the counter
if the next row "percentage" is again greater than 5, we break the counting and set the "counter" column again to -1

I've tried something like this, but it's not good, since the reset is not working:

df['counter'] = np.where(df['percentage'] > 5, -1, df.groupby('symbol').cumcount())

CodePudding user response：

IIUC, you can use a mask and a custom groupby:

m = df['percentage'].gt(5)
group = m.groupby(df['symbol']).apply(lambda s: s.ne(s.shift()).cumsum())

df['count'] = (df
               .groupby(['symbol', group])
               .cumcount()
               .mask(m, -1)
               )

Output:

   symbol  percentage  counter
0       A          11       -1
1       A           2        0
2       A           5        1
3       B           2        0
4       B           1        1
5       B           3        2
6       A           2        2
7       A           9       -1
8       A           4        0
9       B           2        3
10      B           8       -1
11      B           7       -1