Home > database >  Rolling windows in Pandas: How to wrap around with DatetimeIndex?
Rolling windows in Pandas: How to wrap around with DatetimeIndex?

Time:07-06

I have a DataFrame where the index is a DatetimeIndex with a daily frequency. It contains 365 rows, one for each day of the year. When computing rolling sums, the first few elements are always NaN (as expected), but I'd like them to have actual values.

For example, if using a rolling window of 3 samples, the value for Jan 1 should be the sum of Dec 30, Dec 31, and Jan 1. Similarly, the value for Jan 2 should be the sum of Dec 31, Jan 1, and Jan 2.

I have looked into all parameters of the rolling function in Pandas, and could not find anything that would provide this wrapping. Any help would be appreciated. The code below is a minimal example illustrating the behavior of rolling.

import numpy as np
import pandas as pd

fake_data = pd.DataFrame(index=pd.date_range('2022-1-1', '2022-12-31', freq='D'),
                         data=np.random.random(365))
rolling_fake_data = fake_data.rolling(3).sum()

CodePudding user response:

You can just pass to rolling the min_periods argument. It would look like

import numpy as np
import pandas as pd

fake_data = pd.DataFrame(index=pd.date_range('2022-1-1', '2022-12-31', freq='D'),
                         data=np.random.random(365))
rolling_fake_data = fake_data.rolling(3, min_periods=0).sum()

What will happen is that for the first rows where you don't have the window with its complete size, i.e. 3, the sum will be performed with the window up to that point, thus first element would be the first element itself, second element would be first second, and from the third onwards the window would have the full size so would compute as expected.

CodePudding user response:

You're basically asking for a circular data object which has no start or end. Not sure that exists!

The best work-around I can think of is to repeat the end of the series before the beginning.

n = 3
rolling_fake_data = (
    pd.concat([fake_data[-n:], fake_data])
).rolling(n).sum()[n:]

# Test
assert(rolling_fake_data.loc["2022-01-01", 0] ==
    fake_data.loc[["2022-12-30", "2022-12-31", "2022-01-01"], 0].sum())
  • Related