Home > Mobile >  Pandas: sum next 5 items of dataframe after some specific item
Pandas: sum next 5 items of dataframe after some specific item

Time:11-27

I have DataFrame which looks like just a list of numbers:

original option 1 option 2
1 NaN NaN
-1 NaN 9
4 NaN NaN
-1 NaN 15
6 9 NaN
7 NaN NaN
2 15 NaN
3 NaN NaN
0 NaN NaN

I need to sum next 3 values of df after each negative value - see "option1" or "option2" columns. If will also work if I get only sum results, i.e. a separate data structure which would look like [9, 15].

Any thoughts?

CodePudding user response:

One approach could be as follows:

import pandas as pd

data = {'original': {0: 1, 1: -1, 2: 4, 3: -1, 4: 6, 5: 7, 6: 2, 7: 3, 8: 0}}
df = pd.DataFrame(data)

n = 3

df['option 1'] = (df['original'].rolling(n).sum()
                  .where(df['original'].shift(n).lt(0))
                  )
                  
df['option 2'] = df['option 1'].shift(-n)

print(df)

   original  option 1  option 2
0         1       NaN       NaN
1        -1       NaN       9.0
2         4       NaN       NaN
3        -1       NaN      15.0
4         6       9.0       NaN
5         7       NaN       NaN
6         2      15.0       NaN
7         3       NaN       NaN
8         0       NaN       NaN

Explanation

  • First, use Series.rolling to create a rolling window for applying sum.
  • Next, chain Series.where and set the cond parameter to an evaluation of values less than zero (lt) for a shifted (shift) version of column original.
  • For option 2 we simply apply a negative shift on option 1.
  • Related