Home > Back-end >  Pandas rolling correlation always returns NaN when there is a NaN. Not the same behavior as DataFram
Pandas rolling correlation always returns NaN when there is a NaN. Not the same behavior as DataFram

Time:07-13

The below code will output only Nan Values

df = pd.DataFrame({'B': [0, 1, 3, np.nan, 4,5,6],'A': [0, 1, 2,3, 4,5,6]})
df["corr"] = df['A'].rolling(4).corr(df['B'],min_periods=1)
print(df["corr"] )

It seems that min_periods option is not working. I wish I had the same behavior as:

df = pd.DataFrame({'B': [0, 1, 3, np.nan],'A': [0, 1, 2, 3]})
print(df.corr() )

That will print the correct correlation even with a NaN value. I can't just filter out NaN rows because I'm working with a timeseries object and that would give me windows with different time periods.

Using latest pandas version (1.4.3)

CodePudding user response:

You may check with min_periods with rolling

df['cor'] = df['A'].rolling(4,min_periods=1).corr(df['B'])
Out[305]: 
0           NaN
1    1.00000000
2    0.98198051
3    0.98198051
4    0.92857143
5    0.98198051
6    1.00000000
dtype: float64
  • Related