The below code will output only Nan Values
df = pd.DataFrame({'B': [0, 1, 3, np.nan, 4,5,6],'A': [0, 1, 2,3, 4,5,6]})
df["corr"] = df['A'].rolling(4).corr(df['B'],min_periods=1)
print(df["corr"] )
It seems that min_periods option is not working. I wish I had the same behavior as:
df = pd.DataFrame({'B': [0, 1, 3, np.nan],'A': [0, 1, 2, 3]})
print(df.corr() )
That will print the correct correlation even with a NaN value. I can't just filter out NaN rows because I'm working with a timeseries object and that would give me windows with different time periods.
Using latest pandas version (1.4.3)
CodePudding user response:
You may check with min_periods
with rolling
df['cor'] = df['A'].rolling(4,min_periods=1).corr(df['B'])
Out[305]:
0 NaN
1 1.00000000
2 0.98198051
3 0.98198051
4 0.92857143
5 0.98198051
6 1.00000000
dtype: float64