I need to add a new column labelled "counter" to the existing dataframe that will be calculated as shows in the example below:
symbol | percentage | ??? counter ??? |
---|---|---|
A | 11 | -1 |
A | 2 | 0 |
A | 5 | 1 |
B | 2 | 0 |
B | 1 | 1 |
B | 3 | 2 |
A | 2 | 2 |
A | 9 | -1 |
A | 4 | 0 |
B | 2 | 3 |
B | 8 | -1 |
B | 7 | -1 |
So the data is grouped per "symbol" and the logic for calculating the "counter" is like this:
- if the "percentage" is greater than 5, then "counter" is equal to -1
- if the "percentage" is less than 5, then we start counter with 0, if the next row for the same symbol is again less than 5, we increase the counter
- if the next row "percentage" is again greater than 5, we break the counting and set the "counter" column again to -1
I've tried something like this, but it's not good, since the reset is not working:
df['counter'] = np.where(df['percentage'] > 5, -1, df.groupby('symbol').cumcount())
CodePudding user response:
IIUC, you can use a mask and a custom groupby
:
m = df['percentage'].gt(5)
group = m.groupby(df['symbol']).apply(lambda s: s.ne(s.shift()).cumsum())
df['count'] = (df
.groupby(['symbol', group])
.cumcount()
.mask(m, -1)
)
Output:
symbol percentage counter
0 A 11 -1
1 A 2 0
2 A 5 1
3 B 2 0
4 B 1 1
5 B 3 2
6 A 2 2
7 A 9 -1
8 A 4 0
9 B 2 3
10 B 8 -1
11 B 7 -1