Home > database >  Seaborn countplot with group order
Seaborn countplot with group order

Time:02-23

I try to plot a count plot using seaborn and matplotlib. Given each year, I want to sort the count "drought types" within each year so that it looks better. Currently it is unsorted within each year and look very messy. Thank you!

import seaborn as sns
import matplotlib.pyplot as plt
count=pd.read_csv(r"https://raw.githubusercontent.com/tuyenhavan/Course_Data/main/count.csv")

plt.figure(figsize=(15,8))
sns.countplot(x= 'Year', hue = 'Drought types', data = count, palette = 'YlOrRd')

plt.legend(loc = "best",frameon=True,bbox_to_anchor=(0.9,0.75))
plt.show()

CodePudding user response:

The following approach draws the years one-by-one. order= is used to fix the order of the years. hue_order is recalculated for each individual year (.reindex() is needed to make sure all drought_types are present).

A dictionary palette is used to make sure each hue value gets the same color, independent of the order. The automatic legend repeats all hue values for each year, so the legend needs to be reduced.

By the way, loc='best' shouldn't be used together with bbox_to_anchor in the legend, as it might cause very unexpected changes with small changes in the data. loc='best' will be changed to one of the 9 possible locations depending on the available space.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

count = pd.read_csv("https://raw.githubusercontent.com/tuyenhavan/Course_Data/main/count.csv")

fig, ax = plt.subplots(figsize=(15, 8))

drought_types = count['Drought types'].unique()
palette = {drought_type: color
           for drought_type, color in zip(drought_types, sns.color_palette('YlOrRd', len(drought_types)))}
all_years = range(count['Year'].min(), count['Year'].max()   1)
sns.set_style('darkgrid')
for year in all_years:
    year_data = count[count['Year'] == year]
    if len(year_data) > 0:
        # reindex is needed to make sure all drought_types are present
        hue_order = year_data.groupby('Drought types').size().reindex(drought_types).sort_values(ascending=True).index
        sns.countplot(x='Year', order=all_years,
                      hue='Drought types', hue_order=hue_order,
                      data=year_data, palette=palette, ax=ax)

# handles, _ = ax.get_legend_handles_labels()
# handles = handles[:len(drought_types)]
handles = [plt.Rectangle((0, 0), 0, 0, color=palette[drought_type], label=drought_type)
           for drought_type in drought_types]
ax.legend(handles=handles, loc="upper right", frameon=True, bbox_to_anchor=(0.995, 0.99))
plt.show()

sns.countplot, hues reodered per x value

  • Related