Home > Software engineering >  Drop the features that have less correlation with respect to target variable
Drop the features that have less correlation with respect to target variable

Time:09-06

I have loaded a dataset and tried to find the correlation coefficient with respect to target variable.

Below are the codes:

from sklearn.datasets import load_boston
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns


#Loading the dataset
x = load_boston()
df = pd.DataFrame(x.data, columns = x.feature_names)
df["MEDV"] = x.target
X = df.drop("MEDV",1)   #Feature Matrix
y = df["MEDV"]          #Target Variable
df.head()


#Using Pearson Correlation
plt.figure(figsize=(12,10))
cor = df.corr()
sns.heatmap(cor, annot=True, cmap=plt.cm.Reds)
plt.show()


#Correlation with output variable
cor_target = abs(cor["MEDV"])

#Selecting highly correlated features
relevant_features = cor_target[cor_target>0.4]
print(relevant_features)

How do I drop the features that have correlation coefficient < 0.4?

CodePudding user response:

Try this:

#Selecting least correlated features
irelevant_features = cor_target[cor_target<0.4]

# list of irelevant_features
cols = list([i for i in irelevant_features.index])

#Dropping irelevant_features
df = df.drop(cols, axis=1)

CodePudding user response:

  1. relevant_features = cor_target[cor_target < 0.4] print(relevant_features) X = df.drop(['MEDV','CRIM', 'ZN', 'CHAS','AGE', 'DIS','RAD', 'B'], 1)

  2. use: for i in irelevant_features(As written above)

  • Related