Home > front end >  Pandas Rolling Function is not working properly
Pandas Rolling Function is not working properly

Time:05-20

I have the following DataFrame sample:

df = pd.DataFrame({'date':['2021-05-03','2021-05-10','2021-05-17','2021-05-24',
 '2021-05-31','2021-06-07','2021-06-14','2021-06-21','2021-06-28','2021-07-05','2021-07-12','2021-07-19','2021-05-26'], 'spend':[1418,4130,4216,3374,3587,3665,4118,4534,4829,3156,2998,3025,3397]})

This is the code used:

df['spend avg'] = df['spend'].rolling(7).median()

This is the output that I got:

    df = pd.DataFrame({'date' : ['2021-05-03','2021-05-10','2021-05-17','2021-05-24',
'2021-05-31','2021-06-07','2021-06-14','2021-06-21','2021-06-28','2021-07-05','2021-07-12','2021-07-19','2021-05-26'], 'spend':[1418,4130,4216,3374,3587,3665,4118,4534,4829,3156,2998,3025,3397], 'spend_avg' :[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,3665.0,4118.0,4118.0,3665.0,3665.0,3665.0,3397.0]})

As you can see, it is not calculating the average with the rolling averages (window = 7). I understand the NaNs are normal, but if you take a look at the values from the spend avg column, they are repeated from the spending column.

Why is this happening? What am I doing wrong?

The desirable output would be:

    df = pd.DataFrame({'date' : ['2021-05-03','2021-05-10','2021-05-17','2021-05-24','2021-05-31','2021-06-07','2021-06-14','2021-06-21','2021-06-28','2021-07-05','2021-07-12','2021-07-19','2021-05-26'], 'spend_avg' :[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,3501,3946,3894,3841,3665,3760,3722]})

Thanks!

CodePudding user response:

You want mean not median:

In [667]: df.rolling(window=7).mean()
Out[667]: 
          spend
0           NaN
1           NaN
2           NaN
3           NaN
4           NaN
5           NaN
6   3501.142857
7   3946.285714
8   4046.142857
9   3894.714286
10  3841.000000
11  3760.714286
12  3722.428571
  • Related