Home > other >  How to ignore NaN values for a rolling mean calculation in pandas DataFrame?
How to ignore NaN values for a rolling mean calculation in pandas DataFrame?

Time:12-01

I try to create a DataFrame containing a rolling mean based on a window with length 5. But my data contains one NaN value and therefore I only get NaN values for column 3 with a NaN values. How is it possible to ignore NaN values when using .rolling(5).mean()?

I have this sample data df1:

    Column1 Column2 Column3 Column4
0   1       5       -9.0    13
1   1       6       -10.0   15
2   3       7       -5.0    11
3   4       8       NaN     9
4   6       5       -2.0    8
5   2       8       0.0     10
6   3       8       -3.0    12

For convenience:

#create DataFrame with NaN
df1 = pd.DataFrame({
                    'Column1':[1, 1, 3, 4, 6, 2, 3], 
                    'Column2':[5, 6, 7, 8, 5, 8, 8], 
                    'Column3':[-9, -10, -5, 'NaN', -2, 0, -3], 
                    'Column4':[13, 15, 11, 9, 8, 10, 12]
                    })
df1 = df1.replace('NaN',np.nan)
df1

When I use to create a rolling mean based on a window of 5, I get for column 3 only NaN values.

df2 = df1.rolling(5).mean()


    Column1 Column2 Column3 Column4
0   NaN     NaN     NaN     NaN
1   NaN     NaN     NaN     NaN
2   NaN     NaN     NaN     NaN
3   NaN     NaN     NaN     NaN
4   3.0     6.2     NaN     11.2
5   3.2     6.8     NaN     10.6
6   3.6     7.2     NaN     10.0

CodePudding user response:

Pandas mean has a skipna flag to be told to ignore the NaNs see

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html

Try

df2 = df1.rolling(5).mean(skipna=True)

or

df2 = df1.rolling(5).apply(pd.np.nanmean)

CodePudding user response:

You should interpolate the NaN with either 0 or mean.

Below works.

df1 = df1.fillna(df1.mean())

df2 = df1.rolling(5).mean()

  • Related