Home > OS >  Spearman's rank correlation for each column
Spearman's rank correlation for each column

Time:06-21

I have a dataframe with 145 rows and 135 columns. I want to perform Spearman's rank correlation for each column with respect to each other column (thus 135x135). I then want to those these correlation in a new dataframe. (I have not done that yet.)

import pandas as pd
import numpy as np

overview = pd.read_excel(r'overview_20062022.xlsx')
df = pd.DataFrame(overview,
                  columns=['all the column names'])

from scipy.stats import spearmanr

# calculate spearman's correlation
for column in df.iteritems():
    coef, p = spearmanr(df.iteritems(), df.iteritems())
    print('Spearmans correlation coefficient: %.3f' % coef)
    # interpret the significance
    alpha = 0.05
    if p > alpha:
        print('Samples are uncorrelated (fail to reject H0) p=%.3f' % p)
    else:
        print('Samples are correlated (reject H0) p=%.3f' % p)

However, this now leads to NaN. Based on this question I tried to use iteritems, but this has not worked, unfortunately.

CodePudding user response:

You can use pandas build-in corr function. here

CodePudding user response:

You don't need to iterate over the columns, you can get the c X c matrix directly -

df = pd.DataFrame(np.random.rand(30, 3)) # 30 rows, 3 columns, for example
corrs, pvals = spearmanr(df, axis=0)
print(corrs)

Output

array([[ 1.        ,  0.0865406 , -0.13503893],
       [ 0.0865406 ,  1.        ,  0.09010011],
       [-0.13503893,  0.09010011,  1.        ]])
  • Related