Pingouin rcorr heatmap-CodePudding

I’ve been using Pandas .corr() with Seaborn to generate a heatmap showing correlation, but want to switch to Pingouin’s .rcorr() to give a bit more control.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pingouin as pg


df_correlation = df.rcorr(method='spearman')

This gives a df similar to below (taken directly from the Pingouin documentation as an example).

    Neuroticism Extraversion Openness Agreeableness
Neuroticism             -          ***                     **
Extraversion        -0.35            -      ***
Openness            -0.01        0.265        -           ***
Agreeableness      -0.134        0.054    0.161             -

When using Pandas .corr() I've been able to plot the heapmap directly using Seaborn, and then mask the upper diagonal, but this doesn't work due to the presence of ***.

I'm looking for a way to plot this Pingouin-derived data as a heat map, by taking just the numbers (but a bonus if the * can be included in the upper quadrant).

My current 'fix' for this is to use a series of .replace() modifications to change '-' for '1' etc, but this does not seem like a great solution

df_correlation.replace(['-'], 1)

CodePudding user response：

You could do something as follows:

import numpy as np
import pandas as pd
import pingouin as pg
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme() # or leave out for white background '***' and empty fields

# https://pingouin-stats.org/generated/pingouin.rcorr.html#pingouin.rcorr
df = pg.read_dataset('pairwise_corr').iloc[:, 1:]

df_correlation = df.rcorr(method='spearman')
df_correlation = df_correlation.replace('-','1.0')

# create mask (see source below in answer)
mask = np.zeros_like(df_correlation, dtype=bool)
mask[np.tril_indices_from(mask)] = True

# apply mask, and set type to float
ax = sns.heatmap(df_correlation.where(mask).astype(float), 
                 annot=True, fmt="g", cmap='YlGnBu')

# invert mask for the labels ('***' vals)
labels = df_correlation.where(~mask).to_numpy()

# add individual labels using `ax.text` (see source below in answer)
for (j,i), label in np.ndenumerate(labels):
    if isinstance(label,str):
        ax.text(i 0.5, j 0.5, label, 
                fontdict=dict(ha='center',  va='center',
                                         color='black', fontsize=20))
        
plt.show()

Result:

SO sources:

On generating the mask, see Python generate a mask for the lower triangle of a matrix.
On the annotations, see Exclude a column from Seaborn Heatmap formatting, but keep in the map.