I'm trying to make a simple categorical barplot to show how many extra-marital affairs people had based on their happiness rating (x axis) and age group (in the legend/key). This is my code.
import pandas as pd
import seaborn as sns
url = 'https://vincentarelbundock.github.io/Rdatasets/csv/AER/Affairs.csv'
affairs = pd.read_csv(url)
age_categorical = []
for row in affairs['age']:
if 0<row<30:
age_categorical.append("Under 30")
elif 30<=row<=40:
age_categorical.append("30 to 40 Years Old")
elif 40<row<=55:
age_categorical.append("41 to 55 years old")
else:
age_categorical.append("Older than 55")
affairs['age_categorical'] = age_categorical
# count number of affairs for each happiness rating
affairs_subset = affairs.copy()
affairs_subset = affairs_subset[affairs_subset["affairs_dummy"] != 0]
affairs_rating = affairs_subset.groupby('rating').size()
# create dataframe table for plot
happiness = pd.DataFrame({
'Happiness rating' : [1, 2, 3, 4, 5],
'Number of Affairees': [8, 33, 27, 48, 34]
})
# put in barplot
plot = sns.barplot(x = 'Happiness rating', y='Number of Affairees', hue=affairs['age_categorical'],data = happiness).set(title='Happiness and extra-marital affairs')
The resulting bar plot doesn't show all the categories in the legend.
CodePudding user response:
The Values need to be in the same dataframe so that the order is correct. Also you can just group by and create the table you are looking to plot.
affairs_subset = affairs_subset[affairs_subset["affairs"] != 0]
df = affairs_subset.groupby(['rating','age_categorical'])['affairs'].sum().reset_index()
sns.barplot(data=df, x="rating", y="affairs", hue="age_categorical")