Home > Enterprise >  How to return the correlation value from pandas dataframe
How to return the correlation value from pandas dataframe

Time:10-19

I am working on a method for calculating the correlation between to columns of data from a dataset. The dataset is constructed of 4 columns A1, A2, A3, and Class. My goal is remove A3 if the correlation between A1 & A3 greater than 0.6 or if the correlation between A1 & A3 is less than 0.6.

A sample of the data set is given below:

A1,A2,A3,Class
2,0.4631338,1.5,3
8,0.7460648,3.0,3
6,0.264391038,2.5,2
5,0.4406713,2.3,1
2,0.410438159,1.5,3
2,0.302901816,1.5,2
6,0.275869396,2.5,3
8,0.084782428,3.0,3

The python program that I am using for this project is written like so

from numpy.core.defchararray import count
import pandas as pd
import numpy as np
import numpy as np

def main():
    s = pd.read_csv('A1-dm.csv')
    print(calculate_correlation(s))

def calculate_correlation(s):
    # if correlation > 0.6 or correlation < 0.6 remove A3
    s = s[['A1','A3']]
    return s.corr()[1,0]

main()

When I run my code I get the following error:

File "C:\Users\physe\AppData\Roaming\Python\Python36\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
    raise KeyError(key) from err
KeyError: (1, 0)

I've reviewed the documentation here. The issue that I'm facing is selecting the 1,0 element from the covariance matrix that is returned by .corr(). Any help with this would be greatly appreciated.

CodePudding user response:

Here is my example:

cor = df.corr()
if cor ['A3'] > 0.6:
   train.drop(columns = 'Age', inplace = True)
else:
   pass

CodePudding user response:

Try:

corr = df.corr()
if corr['A3'].loc['A1']!=0.6:
    df.drop(columns=['A3'], inplace=True)

CodePudding user response:

Use .iloc to get the 1,0 element from the covariance matrix.

Here:

def calculate_correlation(s):
    # if correlation > 0.6 or correlation < 0.6 remove A3
    s = s[['A1','A3']]
    return (s.corr().iloc[1,0])
  • Related