I have a dataframe1 that shows the audience's rating and the genre of each movie:
movie_id| rating | action | comedy | drama
0 4 1 1 1
1 5 0 1 0
2 3 0 1 1
1 for action means it is an action movie, and 0 means it is not.
I extracted the average rating for a single genre. Action for example, I did this:
new=df1[df1["action"]==1]
new['rating'].mean()
which shows 4. But Now I have to extract average rating for all genres which should look like this:
action | comedy | drama
4 4 3.5
Any advice on how to approach?
CodePudding user response:
In your case we can select the columns then where
all 0 to NaN
and mul
with the rating
out = df.loc[:,['action','comedy','drama']].where(lambda x : x==1).mul(df.rating,axis=0).mean()
Out[377]:
action 4.0
comedy 4.0
drama 3.5
dtype: float64
If you would like a dataframe
out = out.to_frame().T
CodePudding user response:
You can melt the genre columns and filter to only keep values equal to 1. Then group by the genres and calculate the mean.
pd.melt(
df,
value_vars=["action", "comedy", "drama"],
var_name="genre",
id_vars=["movie_id", "rating"],
).query("value == 1").groupby("genre")["rating"].mean()
which gives
genre
action 4.0
comedy 4.0
drama 3.5
Name: rating, dtype: float64