Home > Blockchain >  Extract mean value for each column in Pandas
Extract mean value for each column in Pandas

Time:11-27

I have a dataframe1 that shows the audience's rating and the genre of each movie:

movie_id| rating | action | comedy | drama
0         4        1        1        1
1         5        0        1        0
2         3        0        1        1

1 for action means it is an action movie, and 0 means it is not.

I extracted the average rating for a single genre. Action for example, I did this:

new=df1[df1["action"]==1]
new['rating'].mean()

which shows 4. But Now I have to extract average rating for all genres which should look like this:

action | comedy | drama
4        4        3.5

Any advice on how to approach?

CodePudding user response:

In your case we can select the columns then where all 0 to NaN and mul with the rating

out = df.loc[:,['action','comedy','drama']].where(lambda x : x==1).mul(df.rating,axis=0).mean()
Out[377]: 
action    4.0
comedy    4.0
drama     3.5
dtype: float64

If you would like a dataframe

out = out.to_frame().T

CodePudding user response:

You can melt the genre columns and filter to only keep values equal to 1. Then group by the genres and calculate the mean.

pd.melt(
    df,
    value_vars=["action", "comedy", "drama"],
    var_name="genre",
    id_vars=["movie_id", "rating"],
).query("value == 1").groupby("genre")["rating"].mean()

which gives

genre
action    4.0
comedy    4.0
drama     3.5
Name: rating, dtype: float64
  • Related