I have a Data Frame with 24 columns. I have to show the correlation among all the variables graphically. And I have to find the two most correlated variables to the variable "Price". I got the way a show this Data Frame as a graph:
import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt
# Create the correlation:
df4 = df_csv.corr()
# Show the graph:
sn.heatmap(df4, annot=True)
plt.show()
But due to the graph being so busy with values and I can not check which two variables are more correlated to "Price" at a glance. How can I get some kind of filter to check easier the two most correlated variables to "Price"?
CodePudding user response:
I think you are almost there. You have all the values in your df4. Select the column of interest, sort the values and select the first three rows:
df4["col_of_interest"].sort_values(key=abs, ascending=False).head(3)
As mentioned by Mustafa Aydin in the comments this approach assumes that strongest correlation is independent of the sign. Otherwise remove the key=abs
part.