Home > front end >  Scipy reporting different spearman correlation than pandas
Scipy reporting different spearman correlation than pandas

Time:11-10

print(n1)
print(n2)
print(type(n1), type(n2))
print(scipy.stats.spearmanr(n1, n2))
print(n1.corr(n2, method="spearman"))
0    2317.0
1    2293.0
2    1190.0
3     972.0
4    1391.0
Name: r6000, dtype: float64
0.0    2317.0
1.0    2293.0
3.0    1190.0
4.0     972.0
5.0    1391.0
Name: 6000, dtype: float64
<class 'pandas.core.series.Series'> <class 'pandas.core.series.Series'>
SpearmanrResult(correlation=0.9999999999999999, pvalue=1.4042654220543672e-24)
0.7999999999999999

The problem is that scipy was reporting a different correlation value than pandas.

Edit to add:

The issue is the indexes are off. Pandas does automatic intrinsic data alignment, but scipy doesn't. I've answered it below.

CodePudding user response:

Pandas does't have a function that calculates p-values, so it is better to use SciPy to calculate correlation since it will give you both p-value and correlation coefficient. The other alternative is to calculate the p-value yourself....using Scipy. Note one thing: If you are calculating the correlation of a sample of your data using pandas, the risk is that the correlations change if you change your sample is high. This is why you need the p-value.

CodePudding user response:

I made a copy and called reset_index() on the series before correlating them. That fixed it.

The issue is intrinsic automatic data alignment in pandas based on the indexes.

scipy library doesn't do automatic data alignment, likely just converts it to a numpy array.

  • Related