Home > Mobile >  How to display uncorrelated columns using python code without plotting
How to display uncorrelated columns using python code without plotting

Time:10-02

I have a DataFrame where some columns are columns are correlated and some are not. I want to display only the uncorrelated columns as output. can anyone help me out in solving this.I dont want to plot but display the uncorrelated column names.

CodePudding user response:

You can first compute correlation with df.corr() then find column name like below.

try this:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.RandomState(0).rand(10, 10))
corr = df.corr()
#   0           1           2           3           4            5          6           7           8           9
#0  1.000000    0.347533    0.398948    0.455743    0.072914    -0.233402   -0.731222   0.477978    -0.442621   0.015185
#1  0.347533    1.000000    -0.284056   0.571003    -0.285483   0.382480    -0.362842   0.642578    0.252556    0.190047
#2  0.398948    -0.284056   1.000000    -0.523649   0.152937    -0.139176   -0.092895   0.016266    -0.434016   -0.383585
#3  0.455743    0.571003    -0.523649   1.000000    -0.225343   -0.227577   -0.481548   0.473286    0.279258    0.446650
#4  0.072914    -0.285483   0.152937    -0.225343   1.000000    -0.104438   -0.147477   -0.523283   -0.614603   -0.189916
#5  -0.233402   0.382480    -0.139176   -0.227577   -0.104438   1.000000    -0.030252   0.417640    0.205851    0.095084
#6  -0.731222   -0.362842   -0.092895   -0.481548   -0.147477   -0.030252   1.000000    -0.494440   0.381407    -0.353652
#7  0.477978    0.642578    0.016266    0.473286    -0.523283   0.417640    -0.494440   1.000000    0.375873    0.417863
#8  -0.442621   0.252556    -0.434016   0.279258    -0.614603   0.205851    0.381407    0.375873    1.000000    0.150421
#9  0.015185    0.190047    -0.383585   0.446650    -0.189916   0.095084    -0.353652   0.417863    0.150421    1.000000


threshold = 0.2
uncorr = (corr[(corr.abs() > threshold)].fillna('True').apply(lambda row: row[row == 'True'].index.tolist(), axis=1))
uncorr_df = uncorr.to_frame('col_name_uncorrelated')
# 0 with 4,9 uncorrelated
# 1 with 9 uncorrelated
...
# 9 with 0, 1, 4, 5, 8 uncorrelated

Output:

>>> uncorr_df

    col_name_uncorrelated
0   [4, 9]
1   [9]
2   [4, 5, 6, 7]
3   []
4   [0, 2, 5, 6, 9]
5   [2, 4, 6, 9]
6   [2, 4, 5]
7   [2]
8   [9]
9   [0, 1, 4, 5, 8]

CodePudding user response:

First of all calculate the correlation:

import pandas as pd
myDataFrame=pd.DataFrame(data)

correl=myDataFrame.corr()

Define what you mean by "uncorrelated". I will use an absolute value of 0.5 here

uncor_level=0.5

The following code will give you the names of the pairs that are uncorrelated

pairs=np.full([len(correl)**2,2],None) #define an empty array to store the results
z=0
for x in range(0,len(correl)): #loop for each row(index)

    for y in range(0,len(correl)): #loop for each column

        if abs(correl.iloc[x,y])<uncor_level:

            pair=[correl.index[x],correl.columns[y]]
            pairs[z]=pair
            z=z 1
  • Related