Home > database >  create a column based off two columns
create a column based off two columns

Time:07-19

Hello I would like to know the percentage of gender who saw a movie

data:

d = {'ID': [1,2,3,4,5,6], 'gender': ['male', 'male','male','male','male','female'], 'seen': ['yes','yes','yes','yes','no','no']}
df = pd.DataFrame(data=d)
df

   ID  gender seen
0   1    male  yes
1   2    male  yes
2   3    male  yes
3   4    male  yes
4   5    male   no
5   6  female   no

This is what I tried. I only get the percentage of people who saw the movie but I would like to see the percentage who saw the movie by gender.

c4 = df.groupby(['seen'])\
  .agg(counts = ('ID','size'))\
  .reset_index()\
  .assign(percent = lambda x:100* (x.counts / x.counts.sum()),
          percent1 = lambda x : x.percent.round(0))
c4

For example:

Male | 75%

Female | 25%

CodePudding user response:

This is one of the many reasons why storing values we mean to be boolean as non-booleans is unhelpful.

out = (df.replace({'yes': True, 'no': False})
         .groupby('gender')['seen'].mean())
print(out)

Output:

gender
female    0.0
male      0.8
Name: seen, dtype: float64
  • Related