Hello I would like to know the percentage of gender who saw a movie
data:
d = {'ID': [1,2,3,4,5,6], 'gender': ['male', 'male','male','male','male','female'], 'seen': ['yes','yes','yes','yes','no','no']}
df = pd.DataFrame(data=d)
df
ID gender seen
0 1 male yes
1 2 male yes
2 3 male yes
3 4 male yes
4 5 male no
5 6 female no
This is what I tried. I only get the percentage of people who saw the movie but I would like to see the percentage who saw the movie by gender.
c4 = df.groupby(['seen'])\
.agg(counts = ('ID','size'))\
.reset_index()\
.assign(percent = lambda x:100* (x.counts / x.counts.sum()),
percent1 = lambda x : x.percent.round(0))
c4
For example:
Male | 75%
Female | 25%
CodePudding user response:
This is one of the many reasons why storing values we mean to be boolean as non-booleans is unhelpful.
out = (df.replace({'yes': True, 'no': False})
.groupby('gender')['seen'].mean())
print(out)
Output:
gender
female 0.0
male 0.8
Name: seen, dtype: float64