Home > Enterprise >  How to show best correlation result in dataframe?
How to show best correlation result in dataframe?

Time:05-21

I'd like to show the highest correlation results for my target variable. However, it creates such matrices for me, and I would like these results to be shown from the largest to the smallest in the table. It's best to add some plot. How to do it ? I would like the results to be For SalePrice only, not the table below

korelacja = train.corr()
sns.heatmap(korelacja, vmax=0.9, square=True,cmap='coolwarm')
a = korelacja[korelacja['SalePrice']>0.3]
a

enter image description here

CodePudding user response:

A really simple example:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(-100, 100, (10000, 8)), columns=list("ABCDEFGH"))

df.corr()["H"].loc[df.columns != "H"].\
    sort_values(ascending=False).plot(kind="bar")

This is all in one line, rather than being split up as you did. Split up:

# Correlation matrix
df1 = df.corr()
# Column H of the data frame, and remove row H (as this will obviously be 1, the highest)
df1 = df1["H"].loc[df1.columns != "H"]
# sort the values in descending order
df1.sort_value(ascending=False, inplace=True)
# plot a bar chart
df1.plot(kind="bar")

Using your data specifically:

# correlation matrix
korelacja = train.corr()
# SalePrice column only
korelacja = korelacja["SalePrice"].loc[korelacja.columns != "SalePrice"]
# sort values
korelacja.sort_values(ascending=False, inplace=True)
# condition
korelacja = korelacja[korelacja > 0.3]
# plot bar chart
korelacja.plot(king="bar")
  • Related