Home > Back-end >  How to roll window using future values instead of past values in Pandas
How to roll window using future values instead of past values in Pandas


Pandas seems to use previous rows instead of the next rows when it creates rolling windows. To illustrate, I will use this data:

import pandas as pd

df = pd.DataFrame({
    'date': pd.date_range('2022-01-01', '2022-01-07'),
    'num': [20, 27, 19, 23, 24, 26, 21],
2022-01-01   20
2022-01-02   27
2022-01-03   19
2022-01-04   23
2022-01-05   24
2022-01-06   26
2022-01-07   21

I am trying to create a new column that for each day, calculates the maximum percentage increase of the next two days relative to that day. For example, on 2022-01-01, within a window of the next two days, the number has increased from 20 to 27, which is a 35% increase. This is the code:

df['max_increase'] = df['num'].rolling(3).apply(
    lambda xs: max(xs[1:] - xs[0]) / xs[0]
            num  max_increase
2022-01-01   20           NaN
2022-01-02   27           NaN
2022-01-03   19      0.350000
2022-01-04   23     -0.148148
2022-01-05   24      0.263158
2022-01-06   26      0.130435
2022-01-07   21      0.083333

However, that is not what I want. I do not want the NaN values. I want the 35% increase (i.e. 0.350000) to be associated with 2022-01-01, not with 2022-01-03. Note that df['max_increase'] = df['max_increase'].shift(-2) will not fix the problem, since it will produce NaN values for the last two rows. This is what I want:

            num  max_increase
2022-01-01   20      0.350000  # Increase from 20 to 27.
2022-01-02   27     -0.148148  # Decrease from 27 to 23.
2022-01-03   19      0.263158  # Increase from 19 to 24.
2022-01-04   23      0.130435  # Increase from 23 to 26.
2022-01-05   24      0.083333  # Increase from 24 to 26.
2022-01-06   26     -0.192307  # Decrease from 26 to 21.
2022-01-07   21      0.000000

It appears the the problem stems from the rolling windows that Pandas created. It appears that Pandas created these 3-day windows using previous rows instead of future rows:

  • 2022-01-01
  • 2022-01-01, 2022-01-02
  • 2022-01-01, 2022-01-02, 2022-01-03
  • 2022-01-02, 2022-01-03, 2022-01-04
  • 2022-01-03, 2022-01-04, 2022-01-05
  • 2022-01-04, 2022-01-05, 2022-01-06
  • 2022-01-05, 2022-01-06, 2022-01-07

How do I make Pandas create a forward rolling window instead? i.e.

  • 2022-01-01, 2022-01-02, 2022-01-03
  • 2022-01-02, 2022-01-03, 2022-01-04
  • 2022-01-03, 2022-01-04, 2022-01-05
  • 2022-01-04, 2022-01-05, 2022-01-06
  • 2022-01-05, 2022-01-06, 2022-01-07
  • 2022-01-06, 2022-01-07
  • 2022-01-07

CodePudding user response:

Use a shifted rolling max:

df['max_increase'] = (df.loc[::-1, 'num']
                        .rolling(3, min_periods=1)
                        .apply(lambda x: x.shift().max())


            num  max_increase
2022-01-01   20      0.350000
2022-01-02   27     -0.148148
2022-01-03   19      0.263158
2022-01-04   23      0.130435
2022-01-05   24      0.083333
2022-01-06   26     -0.192308
2022-01-07   21      0.000000

CodePudding user response:

Another option is to append the last two rows to the original dataframe and shifting the results (after applying rolling) afterwards:

n = 3  # window size

df['max_increase'] = (pd.concat([df, pd.concat([df[-1:]]*(n-1))])['num']
    .rolling(n).apply(lambda xs: max(xs[1:] - xs[0]) / xs[0])


            num  max_increase
2022-01-01   20      0.350000
2022-01-02   27     -0.148148
2022-01-03   19      0.263158
2022-01-04   23      0.130435
2022-01-05   24      0.083333
2022-01-06   26     -0.192308
2022-01-07   21      0.000000
  • Related