I am using cars.csv dataset and want to find out the second most highly correlated column eg cars[‘mpg’].corr() gives output as ‘mpg’ itself. I need ‘drat’ output for ‘mpg’ as input
CodePudding user response:
def get_max_correlated_column(a): return cars.corr()[a].sort_values(ascending=False).index[1]
get_max_correlated_column('mpg')
CodePudding user response:
I think this is what you are looking for. The example looks for the atribute which is the most correlated to Cylinders and returns EngineSize.
import pandas as pd
correlation = (
pd.read_csv('CARS.csv')
.corr()["Cylinders"].nlargest(2)[1:]
)
correlation
Output:
EngineSize 0.908002
Name: Cylinders, dtype: float64