Home > Software engineering >  pandas rolling on specific column
pandas rolling on specific column

Time:12-22

I'm trying something very simple, seemingly at least which is to do a rolling sum on a column of a dataframe. See minimal example below :

df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
                   "Col2": [13, 23, 18, 33, 48],
                   "Col3": [17, 27, 22, 37, 52]})
df['dt'] = pd.date_range("2020-01-01", "2020-01-05")
index Col1 Col2 Col3 dt.
0 10 13 17 2020-01-01
1 20 23 27 2020-01-02
2 15 18 22 2020-01-03
3 30 33 37 2020-01-04
4 45 48 52 2020-01-05

If I run

df['sum2']=df['Col1'].rolling(window="3d", min_periods=2, on=df['dt']).sum()

then instead of getting what I'm hoping which is a rolling sum on column 1, I get this traceback. If I switch the index to the dt field value it works if I removed the on=df['dt'] param. I've tried on='dt' also with no luck.

This is the error message I get :

...
ValueError: invalid on specified as 0   2020-01-01
1   2020-01-02
2   2020-01-03
3   2020-01-04
4   2020-01-05
Name: dt, dtype: datetime64[ns], must be a column (of DataFrame), an Index or None

Anything I'm overlooking? thanks!

CodePudding user response:

The correct syntax is:

df['sum2'] = df.rolling(window="3d", min_periods=2, on='dt')['Col1'].sum()
print(df)

# Output:
   Col1  Col2  Col3         dt  sum2
0    10    13    17 2020-01-01   NaN
1    20    23    27 2020-01-02  30.0
2    15    18    22 2020-01-03  45.0
3    30    33    37 2020-01-04  65.0
4    45    48    52 2020-01-05  90.0

Your error is to extract the columns Col1 at first so the column dt does not exist when rolling.

>>> df['Col1']  # the column 'dt' does not exist anymore.
0    10
1    20
2    15
3    30
4    45
Name: Col1, dtype: int64
  • Related