I have a dataframe with 145 rows and 135 columns. I want to perform Spearman's rank correlation for each column with respect to each other column (thus 135x135). I then want to those these correlation in a new dataframe. (I have not done that yet.)
import pandas as pd
import numpy as np
overview = pd.read_excel(r'overview_20062022.xlsx')
df = pd.DataFrame(overview,
columns=['all the column names'])
from scipy.stats import spearmanr
# calculate spearman's correlation
for column in df.iteritems():
coef, p = spearmanr(df.iteritems(), df.iteritems())
print('Spearmans correlation coefficient: %.3f' % coef)
# interpret the significance
alpha = 0.05
if p > alpha:
print('Samples are uncorrelated (fail to reject H0) p=%.3f' % p)
else:
print('Samples are correlated (reject H0) p=%.3f' % p)
However, this now leads to NaN. Based on this question I tried to use iteritems
, but this has not worked, unfortunately.
CodePudding user response:
You can use pandas build-in corr function. here
CodePudding user response:
You don't need to iterate over the columns, you can get the c X c
matrix directly -
df = pd.DataFrame(np.random.rand(30, 3)) # 30 rows, 3 columns, for example
corrs, pvals = spearmanr(df, axis=0)
print(corrs)
Output
array([[ 1. , 0.0865406 , -0.13503893],
[ 0.0865406 , 1. , 0.09010011],
[-0.13503893, 0.09010011, 1. ]])