I have two columns in a df and rows of dates. I'd like to see how well each column matches the other and moves in sync with the other column - ie. do they move in tandem and does one influence the movements in the other.
Col1 Col2
Date
1991-01-01 00:00:00 00:00 6.945847 3.4222
1991-04-01 00:00:00 00:00 8.377481 6.7783
1991-07-01 00:00:00 00:00 7.869787 4.6666
... ...
Is there a way to do this in pandas?
I thought of dividing each row by the value in the first row to see the % increase, but wondered if there was a better statistical way of doing this.
Thanks.
CodePudding user response:
If you want to calculate Spearman correlation coefficient you can use Scipy package
df = pd.DataFrame({'Date': ['1991-01-01 00:00:00 00:00', '1991-04-01 00:00:00 00:00', '1991-07-01 00:00:00 00:00'],
'Col1': [6.945847 , 8.377481, 7.869787],
'Col2': [3.4222, 6.7783, 4.6666]}).set_index('Date')
from scipy import stats
stats.spearmanr(df['Col1'], df['Col2'])
>>>
SpearmanrResult(correlation=1.0, pvalue=0.0)
CodePudding user response:
https://en.wikipedia.org/wiki/Cross-correlation
https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.correlate.html
Spearman only tells you the correlation at zero shift.