Home > Net >  Why seaborn with displot irregular
Why seaborn with displot irregular

Time:03-07

I used the script:

sns.displot(data=df, x='New Category', height=5, aspect=3, kde=True)

but the data not irregular like this pict I want the order to be like this::

  • Less than 2 hours
  • Between 1 to 2 hours
  • Between 2 to 4 hours
  • Between 4 to 6 hours
  • Between 6 to 12 hours
  • More than 12 hours

The Result of Script:

The Result of Script

CodePudding user response:

The easiest way to fix an order, is via pd.Categorical:

from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# first, create some test data
categories = ['Less than 2 hours', 'Between 1 to 2 hours', 'Between 2 to 4 hours',
              'Between 4 to 6 hours', 'Between 6 to 12 hours', 'More than 12 hours']
weights = np.random.rand(len(categories))   0.1
weights /= weights.sum()
df = pd.DataFrame({'New Category': np.random.choice(categories, 1000, p=weights)})

# fix an order on the column via pd.Categorical
df['New Category'] = pd.Categorical(df['New Category'], categories=categories, ordered=True)

# displot now uses the fixed order 
sns.displot(data=df, x='New Category', height=5, aspect=3, kde=True)
plt.show()

fix an order for sns.displot

CodePudding user response:

The reason is the order in the original df:

import pandas as pd
df = pd.DataFrame({'test': ['Less than 2 hours', 'Less than 2 hours', 'Less than 2 hours', 'Less than 2 hours', 'Between 1 to 2 hours', 'Between 2 to 4 hours', 'Between 4 to 6 hours', 'Between 6 to 12 hours', 'More than 12 hours', 'More than 12 hours']})
sns.displot(data=df, x='test', height=5, aspect=3, kde=True)

result:

enter image description here

While:

import pandas as pd
df = pd.DataFrame({'test': ['Less than 2 hours', 'Less than 2 hours', 'Less than 2 hours', 'Less than 2 hours', 'Between 1 to 2 hours', 'Between 2 to 4 hours', 'Between 4 to 6 hours','More than 12 hours', 'More than 12 hours', 'Between 6 to 12 hours']})
sns.displot(data=df, x='test', height=5, aspect=3, kde=True)

result:

enter image description here

so, use:

mapping = {'Less than 2 hours': 0, 'Between 1 to 2 hours':1, 'Between 2 to 4 hours': 2, 'Between 4 to 6 hours': 3, 'Between 6 to 12 hours': 4, 'More than 12 hours': 5}
out = []
for val in df['test']:
    out.append(mapping[val])
df['ord'] = out
df = df.sort_values('ord')
sns.displot(data=df, x='test', height=5, aspect=3, kde=True)
  • Related