Home > Blockchain >  Pandas - Counter by condition (until first occurrence set NaN)
Pandas - Counter by condition (until first occurrence set NaN)

Time:08-03

I want add a new column, called "counter" that counts rows by condition. In my case if value is greater than zero, the counter is reset to 0, but when the value is less than or equal to zero then it increases by one within its group.

I've wrote a code for that, but I also want that until the value is not greater than zero, to have NaN values.

df = pd.DataFrame({
    'gr': ['a', 'a', 'a', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'b'],
    'val': [-2, -2, -1, 2, 3, 3, -7, -5, 6, 2, 8]
})

condition = df['val'].gt(0)
group = condition.groupby(df['gr']).cumsum()
df['counter'] = (~condition).groupby([df['gr'], group]).cumsum()
print(df)

Output:
   gr  val  counter
0   a   -2        1
1   a   -2        2
2   a   -1        3
3   a    2        0
4   b    3        0
5   b    3        0
6   a   -7        1
7   a   -5        2
8   a    6        0
9   a    2        0
10  b    8        0

But I want this as an output:
   gr  val  counter
0   a   -2      NaN
1   a   -2      NaN
2   a   -1      NaN
3   a    2        0
4   b    3        0
5   b    3        0
6   a   -7        1
7   a   -5        2
8   a    6        0
9   a    2        0
10  b    8        0

EDIT: Second example:

df = pd.DataFrame({
    'gr': ['a', 'a', 'a', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'b'],
    'val': [-2, -2, -1, 2, -3, -3, 7, -5, 6, 2, 8]
})
condition = df['val'].gt(0)
group = condition.groupby(df['gr']).cumsum()
df['counter'] = (~condition).groupby([df['gr'], group]).cumsum()
print(df)

Actual Output:
   gr  val  counter
0   a   -2        1
1   a   -2        2
2   a   -1        3
3   a    2        0
4   b   -3        1
5   b   -3        2
6   a    7        0
7   a   -5        1
8   a    6        0
9   a    2        0
10  b    8        0


Expected output:
   gr  val  counter
0   a   -2      NaN
1   a   -2      NaN
2   a   -1      NaN
3   a    2        0
4   b   -3      NaN
5   b   -3      NaN
6   a    7        0
7   a   -5        1
8   a    6        0
9   a    2        0
10  b    8        0

CodePudding user response:

At the end of your code, you could select the first entries in column counter based on df.idxmax minus 1 applied to column val and assign them np.nan:

n=0
df.loc[:df.val.gt(n).idxmax()-1,'counter']=np.nan

print(df)

   gr  val  counter
0   a   -2      NaN
1   a   -2      NaN
2   a   -1      NaN
3   a    2      0.0
4   b    3      0.0
5   b    3      0.0
6   a   -7      1.0
7   a   -5      2.0
8   a    6      0.0
9   a    2      0.0
10  b    8      0.0

If all (or none) of your values are greater than n, df.val.gt(n).idxmax()-1 will lead to df.loc[:-1], which is an empty selection. I.e. in those cases no changes will be applied (which is correct).


Update to accommodate for the second example. I trust there is a quicker/better way to achieve this, but the following should work:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'gr': ['a', 'a', 'a', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'b'],
    'val': [-2, -2, -1, 2, -3, -3, 7, -5, 6, 2, 8]
})
condition = df['val'].gt(0)
group = condition.groupby(df['gr']).cumsum()
df['counter'] = (~condition).groupby([df['gr'], group]).cumsum()

# set n, select from df only where col `val` > n, then groupby and get indices 
# for first vals in group
n=0
idxs = df[df.val > n].index.to_series().groupby(df['gr']).first().values
# so: [3,10], i.e. for A index first (val > n) == 3, for B == 10

# next get a list with indices for each group
gr_idxs = list(df.groupby('gr')['val'].indices.values())
# [array([0, 1, 2, 3, 6, 7, 8, 9], dtype=int64),
#  array([ 4,  5, 10], dtype=int64)]

# now we can iterate over the list, and isolate index vals < idxs

for i, gr in enumerate(gr_idxs):
    idx_slice = gr[:np.where(gr == idxs[i])[0][0]]
    # so, for A: [0 1 2], for B: [4 5]
    
    # finally, we use .loc with slice and col 'counter' and assign np.nan
    df.loc[idx_slice,'counter'] = np.nan

print(df)

   gr  val  counter
0   a   -2      NaN
1   a   -2      NaN
2   a   -1      NaN
3   a    2      0.0
4   b   -3      NaN
5   b   -3      NaN
6   a    7      0.0
7   a   -5      1.0
8   a    6      0.0
9   a    2      0.0
10  b    8      0.0

Should also work on a group that has no leading rows to be filled. E.g. we might have:

df = pd.DataFrame({
    'gr': ['c' ,'c', 'c'],
    'val': [1, -1, -2]
})

Would lead to:

  gr  val  counter
0  c    1      0.0
1  c   -1      1.0
2  c   -2      2.0
  • Related