Home > Software design >  Pandas - how to aggregate values between 2 ranges in a specific column
Pandas - how to aggregate values between 2 ranges in a specific column

Time:12-08

I'm working on a df with 2 columns e.g.

column1 = [False, False, False, True, False, False, True]
column2 = [1, 1, 1, 1, 1, 1, 1]

I want to sum all "False" values until the first "True" value, and again sum the following "False" values until the next "True" etc.

The output should be

column3 = [0,0,0,3,0,0,2]

I tried to sum column values but I can't "reset" the counter once hitting a "True" from a different column

CodePudding user response:

You can use:

df['column3'] = (df['column2']
 .mask(df['column1']) # get False values only
 .groupby(df.loc[::-1, 'column1'].cumsum()) # group with next True
 # get sum of False values only where True
 .transform('sum').where(df['column1'], 0).convert_dtypes()
)

Output:

   column1  column2  column3
0    False        1        0
1    False        1        0
2    False        1        0
3     True        1        3
4    False        1        0
5    False        1        0
6     True        1        2
  • Related