Home > Enterprise >  Remove data of type category from plot
Remove data of type category from plot

Time:12-14

Say we have a df with a column defined as a category:

import pandas as pd
df = pd.DataFrame({'Color': ['Yellow', 'Blue', 'Red', 'Red']}, dtype='category') # data type is category

Now say we want to plot these data while removing one of the categorical levels:

# Exclude Yellow, save in new df    
df2 = df.loc[df.Color != 'Yellow']
# Plot
df2.value_counts().plot(kind='bar')

Output:

enter image description here

Although the bar for Yellow is not displayed, the Yellow tick label is still visible.

My question: How do we completely remove Yellow from the plot?

I suspect this issue is due to the fact that the data type is category. But I don't want to convert the data type. The type category is sometimes useful, e.g., to reorder levels or other operations.

Ideal solution for me would also work with seaborn, where I found a similar issue:

# Remake a df based on the above and plot with seaborn  
df2=pd.DataFrame(df2.value_counts()).reset_index()
import seaborn as sns
from matplotlib import pyplot as plt
sns.catplot(data=df2, x=0, y='Color', kind='bar')
plt.show()

Output:

enter image description here

Dani Mesejo answer works, but only with histograms, I believe. And I need bar plots per se.

CodePudding user response:

Using bar plot

Note:

Don't forget to import seaborn

import seaborn as sns

CodePudding user response:

You can convert categorical values to string for the plot (not inplace) your datatypes will remain same in df2:

df2 = df[df['Color'] != 'Yellow']
df2.Color.astype(str).value_counts().plot(kind='bar') 

enter image description here

Or You can use hist for that

df2 = df[df['Color'] != 'Yellow']['Color']
plt.hist(df2)
plt.xlabel('Color')

enter image description here

  • Related