I have the following DataFrame sample:
df = pd.DataFrame({'date':['2021-05-03','2021-05-10','2021-05-17','2021-05-24',
'2021-05-31','2021-06-07','2021-06-14','2021-06-21','2021-06-28','2021-07-05','2021-07-12','2021-07-19','2021-05-26'], 'spend':[1418,4130,4216,3374,3587,3665,4118,4534,4829,3156,2998,3025,3397]})
This is the code used:
df['spend avg'] = df['spend'].rolling(7).median()
This is the output that I got:
df = pd.DataFrame({'date' : ['2021-05-03','2021-05-10','2021-05-17','2021-05-24',
'2021-05-31','2021-06-07','2021-06-14','2021-06-21','2021-06-28','2021-07-05','2021-07-12','2021-07-19','2021-05-26'], 'spend':[1418,4130,4216,3374,3587,3665,4118,4534,4829,3156,2998,3025,3397], 'spend_avg' :[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,3665.0,4118.0,4118.0,3665.0,3665.0,3665.0,3397.0]})
As you can see, it is not calculating the average with the rolling averages (window = 7). I understand the NaNs are normal, but if you take a look at the values from the spend avg column, they are repeated from the spending column.
Why is this happening? What am I doing wrong?
The desirable output would be:
df = pd.DataFrame({'date' : ['2021-05-03','2021-05-10','2021-05-17','2021-05-24','2021-05-31','2021-06-07','2021-06-14','2021-06-21','2021-06-28','2021-07-05','2021-07-12','2021-07-19','2021-05-26'], 'spend_avg' :[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,3501,3946,3894,3841,3665,3760,3722]})
Thanks!
CodePudding user response:
You want mean
not median
:
In [667]: df.rolling(window=7).mean()
Out[667]:
spend
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 3501.142857
7 3946.285714
8 4046.142857
9 3894.714286
10 3841.000000
11 3760.714286
12 3722.428571