I want add a new column, called "counter" that counts rows by condition. In my case if value is greater than zero, the counter is reset to 0, but when the value is less than or equal to zero then it increases by one within its group.
I've wrote a code for that, but I also want that until the value is not greater than zero, to have NaN values.
df = pd.DataFrame({
'gr': ['a', 'a', 'a', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'b'],
'val': [-2, -2, -1, 2, 3, 3, -7, -5, 6, 2, 8]
})
condition = df['val'].gt(0)
group = condition.groupby(df['gr']).cumsum()
df['counter'] = (~condition).groupby([df['gr'], group]).cumsum()
print(df)
Output:
gr val counter
0 a -2 1
1 a -2 2
2 a -1 3
3 a 2 0
4 b 3 0
5 b 3 0
6 a -7 1
7 a -5 2
8 a 6 0
9 a 2 0
10 b 8 0
But I want this as an output:
gr val counter
0 a -2 NaN
1 a -2 NaN
2 a -1 NaN
3 a 2 0
4 b 3 0
5 b 3 0
6 a -7 1
7 a -5 2
8 a 6 0
9 a 2 0
10 b 8 0
EDIT: Second example:
df = pd.DataFrame({
'gr': ['a', 'a', 'a', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'b'],
'val': [-2, -2, -1, 2, -3, -3, 7, -5, 6, 2, 8]
})
condition = df['val'].gt(0)
group = condition.groupby(df['gr']).cumsum()
df['counter'] = (~condition).groupby([df['gr'], group]).cumsum()
print(df)
Actual Output:
gr val counter
0 a -2 1
1 a -2 2
2 a -1 3
3 a 2 0
4 b -3 1
5 b -3 2
6 a 7 0
7 a -5 1
8 a 6 0
9 a 2 0
10 b 8 0
Expected output:
gr val counter
0 a -2 NaN
1 a -2 NaN
2 a -1 NaN
3 a 2 0
4 b -3 NaN
5 b -3 NaN
6 a 7 0
7 a -5 1
8 a 6 0
9 a 2 0
10 b 8 0
CodePudding user response:
At the end of your code, you could select the first entries in column counter
based on df.idxmax
minus 1 applied to column val
and assign them np.nan
:
n=0
df.loc[:df.val.gt(n).idxmax()-1,'counter']=np.nan
print(df)
gr val counter
0 a -2 NaN
1 a -2 NaN
2 a -1 NaN
3 a 2 0.0
4 b 3 0.0
5 b 3 0.0
6 a -7 1.0
7 a -5 2.0
8 a 6 0.0
9 a 2 0.0
10 b 8 0.0
If all (or none) of your values are greater than n
, df.val.gt(n).idxmax()-1
will lead to df.loc[:-1]
, which is an empty selection. I.e. in those cases no changes will be applied (which is correct).
Update to accommodate for the second example. I trust there is a quicker/better way to achieve this, but the following should work:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'gr': ['a', 'a', 'a', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'b'],
'val': [-2, -2, -1, 2, -3, -3, 7, -5, 6, 2, 8]
})
condition = df['val'].gt(0)
group = condition.groupby(df['gr']).cumsum()
df['counter'] = (~condition).groupby([df['gr'], group]).cumsum()
# set n, select from df only where col `val` > n, then groupby and get indices
# for first vals in group
n=0
idxs = df[df.val > n].index.to_series().groupby(df['gr']).first().values
# so: [3,10], i.e. for A index first (val > n) == 3, for B == 10
# next get a list with indices for each group
gr_idxs = list(df.groupby('gr')['val'].indices.values())
# [array([0, 1, 2, 3, 6, 7, 8, 9], dtype=int64),
# array([ 4, 5, 10], dtype=int64)]
# now we can iterate over the list, and isolate index vals < idxs
for i, gr in enumerate(gr_idxs):
idx_slice = gr[:np.where(gr == idxs[i])[0][0]]
# so, for A: [0 1 2], for B: [4 5]
# finally, we use .loc with slice and col 'counter' and assign np.nan
df.loc[idx_slice,'counter'] = np.nan
print(df)
gr val counter
0 a -2 NaN
1 a -2 NaN
2 a -1 NaN
3 a 2 0.0
4 b -3 NaN
5 b -3 NaN
6 a 7 0.0
7 a -5 1.0
8 a 6 0.0
9 a 2 0.0
10 b 8 0.0
Should also work on a group that has no leading rows to be filled. E.g. we might have:
df = pd.DataFrame({
'gr': ['c' ,'c', 'c'],
'val': [1, -1, -2]
})
Would lead to:
gr val counter
0 c 1 0.0
1 c -1 1.0
2 c -2 2.0