To find all top correlations you can use the following code according List Highest Correlation Pairs from a Large Correlation Matrix in Pandas?:
d = {'col1': [1, 2], 'col2': [3, 4], 'col3': [7,3]}
df = pd.DataFrame(data=d)
df.corr().unstack().sort_values().drop_duplicates()
How do I have to change the above line in order to compare just one specific column with all others?
I do not want to compare col2 to col3. Just the correlation of col1 to col2 and col1 to col3 is important to me.
CodePudding user response:
You can first compute the full correlation just using df.corr()
.
After that you can select the row of the correlation matrix that is returned by df.corr()
in which you are interested in.
Say you are interested in the correlation between col1
and the others:
d = {'col1': [1, 2], 'col2': [3, 4], 'col3': [7,3]}
df = pd.DataFrame(data=d)
df.corr().loc['col1']
# col1 1.0
# col2 1.0
# col3 -1.0
# Name: col1, dtype: float64