I’ve been using Pandas .corr()
with Seaborn to generate a heatmap showing correlation, but want to switch to Pingouin’s .rcorr()
to give a bit more control.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pingouin as pg
df_correlation = df.rcorr(method='spearman')
This gives a df similar to below (taken directly from the Pingouin documentation as an example).
Neuroticism Extraversion Openness Agreeableness
Neuroticism - *** **
Extraversion -0.35 - ***
Openness -0.01 0.265 - ***
Agreeableness -0.134 0.054 0.161 -
When using Pandas .corr()
I've been able to plot the heapmap directly using Seaborn, and then mask the upper diagonal, but this doesn't work due to the presence of ***
.
I'm looking for a way to plot this Pingouin-derived data as a heat map, by taking just the numbers (but a bonus if the *
can be included in the upper quadrant).
My current 'fix' for this is to use a series of .replace()
modifications to change '-' for '1' etc, but this does not seem like a great solution
df_correlation.replace(['-'], 1)
CodePudding user response:
You could do something as follows:
import numpy as np
import pandas as pd
import pingouin as pg
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme() # or leave out for white background '***' and empty fields
# https://pingouin-stats.org/generated/pingouin.rcorr.html#pingouin.rcorr
df = pg.read_dataset('pairwise_corr').iloc[:, 1:]
df_correlation = df.rcorr(method='spearman')
df_correlation = df_correlation.replace('-','1.0')
# create mask (see source below in answer)
mask = np.zeros_like(df_correlation, dtype=bool)
mask[np.tril_indices_from(mask)] = True
# apply mask, and set type to float
ax = sns.heatmap(df_correlation.where(mask).astype(float),
annot=True, fmt="g", cmap='YlGnBu')
# invert mask for the labels ('***' vals)
labels = df_correlation.where(~mask).to_numpy()
# add individual labels using `ax.text` (see source below in answer)
for (j,i), label in np.ndenumerate(labels):
if isinstance(label,str):
ax.text(i 0.5, j 0.5, label,
fontdict=dict(ha='center', va='center',
color='black', fontsize=20))
plt.show()
Result:
SO sources:
- On generating the mask, see Python generate a mask for the lower triangle of a matrix.
- On the annotations, see Exclude a column from Seaborn Heatmap formatting, but keep in the map.