I have a "members" dataframe where there is a success column and a sex column. I would like to make a histogram that would represent the percentage of successful women and the percentage of successful men. I manage to get the number of successes for both sexes but they have to be updated with the number of men and women otherwise it is not representative. Do you have any idea how to do this?
code :
-dataframe
members = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/members.csv")
print(members)
-inconclusive test
sns.histplot(data=members, x="sex", y="success", stat='percent')
CodePudding user response:
To create a graph showing the percentage according to sex
and success
, I created a dataframe grouping the values of sex
and looking for the normalized quantity with value.counts(normalize=True)
to create a DataFrame.
members = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/members.csv")
df_members = members.groupby('sex')['success'].value_counts(normalize=True).mul(100).rename('percent').reset_index()
df_members
Dataframe :
Then I used sns.catplot to put sex
on the X axis and percent
on the Y axis, separating by success.
sns.catplot(data = df_members, x='sex', y = 'percent',kind='bar', hue='success')