Can you please help me with the following. I have a dataset with a variable - number of products (Prod) that takes discrete values from 1 to 3 (included). Then I have a variable (Gender) 1 for males, 0 for females. I want to plot a multilevel bar chart where on the x-axis I have number of products (Prod) and on the y-axis I have total value of these products that are grouped by the Gender. I need to create a 'count' variable that counts how many observations of each 'Prod' are in each 'Gender' category. To group and plot the variables I use the following code (which does not work):
#Group the variables
grouped_gender['count'] = main_data.groupby(['Prod', 'Gender'])[['Prod']].count()
grouped_gender = pd.DataFrame(grouped_gender)
#Plot
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(10, 7))
barplot2 = sns.barplot(
data=grouped_gender,
x='Prod',
y='count',
hue='Gender',
orient='v',
ax = axes,
ci=None,
dodge=False
)
Can you please help me to identify the problem?
CodePudding user response:
Use countplot
on the original dataset:
# sample dataset
df = sns.load_dataset('tips')
# `day` plays `Prod`, `sex` plays `Gender`
sns.countplot(x='day', hue='sex', data=df)
Output:
Note: if you want the data, not just the plot, use:
counts = pd.crosstab(df['day'], df['sex'])
# then to plot bar chart
# counts.plot.bar()
which gives you:
sex Male Female
day
Thur 30 32
Fri 10 9
Sat 59 28
Sun 58 18
CodePudding user response:
Assuming you can put your DataFrame in a similar state as mine
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
grouped_gender = pd.DataFrame(
{
"Man": [50, 70],
"Woman": [90, 30]
},
index=["Product1", "Product2"]
)
grouped_gender.plot(kind="bar", stacked=True)
plt.title("Products sales")
plt.xlabel("Products")
plt.ylabel("Sales")
plt.show()