Home > Back-end >  How to roll window using future values instead of past values in Pandas
How to roll window using future values instead of past values in Pandas

Time:10-12

Pandas seems to use previous rows instead of the next rows when it creates rolling windows. To illustrate, I will use this data:

import pandas as pd

df = pd.DataFrame({
    'date': pd.date_range('2022-01-01', '2022-01-07'),
    'num': [20, 27, 19, 23, 24, 26, 21],
}).set_index('date')
            num
date           
2022-01-01   20
2022-01-02   27
2022-01-03   19
2022-01-04   23
2022-01-05   24
2022-01-06   26
2022-01-07   21

I am trying to create a new column that for each day, calculates the maximum percentage increase of the next two days relative to that day. For example, on 2022-01-01, within a window of the next two days, the number has increased from 20 to 27, which is a 35% increase. This is the code:

df['max_increase'] = df['num'].rolling(3).apply(
    lambda xs: max(xs[1:] - xs[0]) / xs[0]
)
            num  max_increase
date                         
2022-01-01   20           NaN
2022-01-02   27           NaN
2022-01-03   19      0.350000
2022-01-04   23     -0.148148
2022-01-05   24      0.263158
2022-01-06   26      0.130435
2022-01-07   21      0.083333

However, that is not what I want. I do not want the NaN values. I want the 35% increase (i.e. 0.350000) to be associated with 2022-01-01, not with 2022-01-03. Note that df['max_increase'] = df['max_increase'].shift(-2) will not fix the problem, since it will produce NaN values for the last two rows. This is what I want:

            num  max_increase
date                         
2022-01-01   20      0.350000  # Increase from 20 to 27.
2022-01-02   27     -0.148148  # Decrease from 27 to 23.
2022-01-03   19      0.263158  # Increase from 19 to 24.
2022-01-04   23      0.130435  # Increase from 23 to 26.
2022-01-05   24      0.083333  # Increase from 24 to 26.
2022-01-06   26     -0.192307  # Decrease from 26 to 21.
2022-01-07   21      0.000000

It appears the the problem stems from the rolling windows that Pandas created. It appears that Pandas created these 3-day windows using previous rows instead of future rows:

  • 2022-01-01
  • 2022-01-01, 2022-01-02
  • 2022-01-01, 2022-01-02, 2022-01-03
  • 2022-01-02, 2022-01-03, 2022-01-04
  • 2022-01-03, 2022-01-04, 2022-01-05
  • 2022-01-04, 2022-01-05, 2022-01-06
  • 2022-01-05, 2022-01-06, 2022-01-07

How do I make Pandas create a forward rolling window instead? i.e.

  • 2022-01-01, 2022-01-02, 2022-01-03
  • 2022-01-02, 2022-01-03, 2022-01-04
  • 2022-01-03, 2022-01-04, 2022-01-05
  • 2022-01-04, 2022-01-05, 2022-01-06
  • 2022-01-05, 2022-01-06, 2022-01-07
  • 2022-01-06, 2022-01-07
  • 2022-01-07

CodePudding user response:

Use a shifted rolling max:

df['max_increase'] = (df.loc[::-1, 'num']
                        .rolling(3, min_periods=1)
                        .apply(lambda x: x.shift().max())
                        .sub(df['num'])
                        .div(df['num']).fillna(0)
                     )

output:

            num  max_increase
date                         
2022-01-01   20      0.350000
2022-01-02   27     -0.148148
2022-01-03   19      0.263158
2022-01-04   23      0.130435
2022-01-05   24      0.083333
2022-01-06   26     -0.192308
2022-01-07   21      0.000000

CodePudding user response:

Another option is to append the last two rows to the original dataframe and shifting the results (after applying rolling) afterwards:

n = 3  # window size

df['max_increase'] = (pd.concat([df, pd.concat([df[-1:]]*(n-1))])['num']
    .rolling(n).apply(lambda xs: max(xs[1:] - xs[0]) / xs[0])
    .shift(-(n-1)).dropna())

Output:

            num  max_increase
date                         
2022-01-01   20      0.350000
2022-01-02   27     -0.148148
2022-01-03   19      0.263158
2022-01-04   23      0.130435
2022-01-05   24      0.083333
2022-01-06   26     -0.192308
2022-01-07   21      0.000000
  • Related