How to find 2nd most highly correlated column/variable-CodePudding

I am using cars.csv dataset and want to find out the second most highly correlated column eg cars[‘mpg’].corr() gives output as ‘mpg’ itself. I need ‘drat’ output for ‘mpg’ as input

CodePudding user response：

def get_max_correlated_column(a): return cars.corr()[a].sort_values(ascending=False).index[1]

get_max_correlated_column('mpg')

CodePudding user response：

I think this is what you are looking for. The example looks for the atribute which is the most correlated to Cylinders and returns EngineSize.

import pandas as pd

correlation = (
    pd.read_csv('CARS.csv')
    .corr()["Cylinders"].nlargest(2)[1:]
)
correlation

Output:

EngineSize    0.908002
Name: Cylinders, dtype: float64