I'm trying something very simple, seemingly at least which is to do a rolling sum on a column of a dataframe. See minimal example below :
df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
"Col2": [13, 23, 18, 33, 48],
"Col3": [17, 27, 22, 37, 52]})
df['dt'] = pd.date_range("2020-01-01", "2020-01-05")
index | Col1 | Col2 | Col3 | dt. |
---|---|---|---|---|
0 | 10 | 13 | 17 | 2020-01-01 |
1 | 20 | 23 | 27 | 2020-01-02 |
2 | 15 | 18 | 22 | 2020-01-03 |
3 | 30 | 33 | 37 | 2020-01-04 |
4 | 45 | 48 | 52 | 2020-01-05 |
If I run
df['sum2']=df['Col1'].rolling(window="3d", min_periods=2, on=df['dt']).sum()
then instead of getting what I'm hoping which is a rolling sum on column 1, I get this traceback. If I switch the index to the dt field value it works if I removed the on=df['dt']
param. I've tried on='dt'
also with no luck.
This is the error message I get :
...
ValueError: invalid on specified as 0 2020-01-01
1 2020-01-02
2 2020-01-03
3 2020-01-04
4 2020-01-05
Name: dt, dtype: datetime64[ns], must be a column (of DataFrame), an Index or None
Anything I'm overlooking? thanks!
CodePudding user response:
The correct syntax is:
df['sum2'] = df.rolling(window="3d", min_periods=2, on='dt')['Col1'].sum()
print(df)
# Output:
Col1 Col2 Col3 dt sum2
0 10 13 17 2020-01-01 NaN
1 20 23 27 2020-01-02 30.0
2 15 18 22 2020-01-03 45.0
3 30 33 37 2020-01-04 65.0
4 45 48 52 2020-01-05 90.0
Your error is to extract the columns Col1
at first so the column dt
does not exist when rolling
.
>>> df['Col1'] # the column 'dt' does not exist anymore.
0 10
1 20
2 15
3 30
4 45
Name: Col1, dtype: int64