Home > Blockchain >  How to find mean of a column in Python based on multiple columns with Boolean values
How to find mean of a column in Python based on multiple columns with Boolean values

Time:11-15

I have a dataset, where I need to find top 2 average rated movies for each genre. With the given layout of data, how can I achieve it? Can someone please help me understand a workaround?

In order to simplify it, I have created new columns which separates values from 'genres' into unique genres and assigned values 1 and 0 to it. So, the group by will be based on these new unique genres and not the 'genres' column. For example, if I filter on the column 'Comedy' to 1, the average rating for title 'abc', 'pqr','ghi', and 'mno' are 3.75, 2.9, 2.25, and 3.12 respectively. So, the top 2 movies based on the average rating for Comedy genre will be abc and mno. Similar logic will be followed for other unique genres.

My original dataset is very similar to the sample dataset below:

| movieId | title |                      genres                     | userId | rating | release_year | review_year | Children | Action | Mystery | Sci-Fi | (no genres   listed) | Drama | Fantasy | Film-Noir | Romance | Thriller | Western | Comedy | Documentary | Crime | War | Horror | Animation | IMAX | Adventure | Musical |
|:-------:|:-----:|:-----------------------------------------------:|:------:|:------:|--------------|:-----------:|:--------:|:------:|:-------:|:------:|:--------------------:|:-----:|:-------:|:---------:|:-------:|:--------:|:-------:|:------:|:-----------:|:-----:|:---:|:------:|:---------:|:----:|:---------:|:-------:|
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 3437   | 3.5    | 1995         | 2009        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 7107   | 3      | 1995         | 2014        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 17211  | 4      | 1995         | 2010        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 18056  | 3      | 1995         | 2013        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 22223  | 5      | 1995         | 2014        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 28268  | 3.5    | 1995         | 2016        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 32984  | 4.5    | 1995         | 2013        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 33260  | 0.5    | 1995         | 2014        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 35223  | 3.5    | 1995         | 2009        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 54852  | 4.5    | 1995         | 2014        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 56556  | 4      | 1995         | 2016        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 59622  | 5      | 1995         | 2017        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 78317  | 4.5    | 1995         | 2011        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 1       | abc   | Adventure__Animation__Children__Comedy__Fantasy | 80922  | 4      | 1995         | 2005        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 1         | 0    | 1         | 0       |
| 2       | xyz   | Adventure__Children__Fantasy                    | 8667   | 3      | 2000         | 2010        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 0      | 0           | 0     | 0   | 0      | 0         | 0    | 1         | 0       |
| 2       | xyz   | Adventure__Children__Fantasy                    | 16458  | 2      | 2000         | 2005        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 0      | 0           | 0     | 0   | 0      | 0         | 0    | 1         | 0       |
| 2       | xyz   | Adventure__Children__Fantasy                    | 16464  | 3.5    | 2000         | 2011        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 0      | 0           | 0     | 0   | 0      | 0         | 0    | 1         | 0       |
| 2       | xyz   | Adventure__Children__Fantasy                    | 19786  | 4      | 2000         | 2005        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 0      | 0           | 0     | 0   | 0      | 0         | 0    | 1         | 0       |
| 2       | xyz   | Adventure__Children__Fantasy                    | 30494  | 3.5    | 2000         | 2005        | 1        | 0      | 0       | 0      | 0                    | 0     | 1       | 0         | 0       | 0        | 0       | 0      | 0           | 0     | 0   | 0      | 0         | 0    | 1         | 0       |
| 3       | pqr   | Comedy__Romance                                 | 204961 | 4      | 2008         | 2016        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 3       | pqr   | Comedy__Romance                                 | 220587 | 3.5    | 2008         | 2016        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 3       | pqr   | Comedy__Romance                                 | 224358 | 3      | 2008         | 2014        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 3       | pqr   | Comedy__Romance                                 | 231990 | 2.5    | 2008         | 2005        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 3       | pqr   | Comedy__Romance                                 | 234484 | 3      | 2008         | 2016        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 3       | pqr   | Comedy__Romance                                 | 251794 | 1.5    | 2008         | 2011        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 4       | ghi   | Comedy__Drama__Romance                          | 4294   | 3      | 2012         | 2016        | 0        | 0      | 0       | 0      | 0                    | 1     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 4       | ghi   | Comedy__Drama__Romance                          | 81593  | 2.5    | 2012         | 2009        | 0        | 0      | 0       | 0      | 0                    | 1     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 4       | ghi   | Comedy__Drama__Romance                          | 114192 | 2.5    | 2012         | 2013        | 0        | 0      | 0       | 0      | 0                    | 1     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 4       | ghi   | Comedy__Drama__Romance                          | 178767 | 0.5    | 2012         | 2014        | 0        | 0      | 0       | 0      | 0                    | 1     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 4       | ghi   | Comedy__Drama__Romance                          | 212863 | 2      | 2012         | 2005        | 0        | 0      | 0       | 0      | 0                    | 1     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 4       | ghi   | Comedy__Drama__Romance                          | 251932 | 3      | 2012         | 2005        | 0        | 0      | 0       | 0      | 0                    | 1     | 0       | 0         | 1       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 5       | mno   | Comedy                                          | 16481  | 4      | 2015         | 2016        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 5       | mno   | Comedy                                          | 25066  | 4      | 2015         | 2014        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 5       | mno   | Comedy                                          | 27833  | 3      | 2015         | 2005        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |
| 5       | mno   | Comedy                                          | 31864  | 1.5    | 2015         | 2014        | 0        | 0      | 0       | 0      | 0                    | 0     | 0       | 0         | 0       | 0        | 0       | 1      | 0           | 0     | 0   | 0      | 0         | 0    | 0         | 0       |

as dataframe:

pd.DataFrame({'movieId': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5],
              'title': ['abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','xyz','xyz','xyz','xyz','xyz','pqr','pqr','pqr','pqr','pqr','pqr','ghi','ghi','ghi','ghi','ghi','ghi','mno','mno','mno','mno'],
              'genres': ['Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Children__Fantasy','Adventure__Children__Fantasy','Adventure__Children__Fantasy','Adventure__Children__Fantasy','Adventure__Children__Fantasy','Comedy__Romance','Comedy__Romance','Comedy__Romance','Comedy__Romance','Comedy__Romance','Comedy__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy','Comedy','Comedy','Comedy'],
              'userId': [3437,7107,17211,18056,22223,28268,32984,33260,35223,54852,56556,59622,78317,80922,8667,16458,16464,19786,30494,204961,220587,224358,231990,234484,251794,4294,81593,114192,178767,212863,251932,16481,25066,27833,31864],
              'rating': [3.5,3.0,4.0,3.0,5.0,3.5,4.5,0.5,3.5,4.5,4.0,5.0,4.5,4.0,3.0,2.0,3.5,4.0,3.5,4.0,3.5,3.0,2.5,3.0,1.5,3.0,2.5,2.5,0.5,2.0,3.0,4.0,4.0,3.0,1.5],
              'release_year': [1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,2000,2000,2000,2000,2000,2008,2008,2008,2008,2008,2008,2012,2012,2012,2012,2012,2012,2015,2015,2015,2015],
              'review_year': [2009,2014,2010,2013,2014,2016,2013,2014,2009,2014,2016,2017,2011,2005,2010,2005,2011,2005,2005,2016,2016,2014,2005,2016,2011,2016,2009,2013,2014,2005,2005,2016,2014,2005,2014],
              'Children': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              'Action': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              'Mystery': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              'Sci-Fi': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              '(no genres listed)': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              'Drama': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0],
              'Fantasy': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              'Film-Noir': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              'Romance': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0],
              'Thriller': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              'Western': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              'Comedy': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
              'Documentary': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              'Crime': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              'War': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              'Horror': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              'Animation': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              'IMAX': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              'Adventure': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
              'Musical': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]})

My output would be similar to the one below:

| Genre     | title | avg rating |
|-----------|-------|------------|
| Children  | abc   | 3.75       |
| Drama     | ghi   | 2.25       |
| Fantasy   | abc   | 3.75       |
| Fantasy   | xyz   | 3.2        |
| Romance   | pqr   | 2.9        |
| Romance   | ghi   | 2.25       |
| Comedy    | abc   | 3.75       |
| Comedy    | mno   | 3.12       |
| Animation | abc   | 3.75       |
| Adventure | abc   | 3.75       |
| Adventure | xyz   | 3.2        |

Please note that this is a sample dataset and there are many unique genres whose value is 0. So, such unique genres are not present in the column 'genre' of the final output. Also, some of these unique genres have only one movie falling in their category, so they have only one listing of those movies.

CodePudding user response:

You want to get the top 2 movies per individual genre, you first need to melt the dummy variables, then groupby agg:

(df.melt(id_vars=df.columns[:7], var_name='genre')
   .query('value == 1')
   .groupby(['genre', 'movieId'], as_index=False).agg({'title': 'first', 'rating': 'mean'})
   .groupby('genre').head(2)
)

output:

        genre  movieId title    rating
0   Adventure        1   abc  3.750000
1   Adventure        2   xyz  3.200000
2   Animation        1   abc  3.750000
3    Children        1   abc  3.750000
4    Children        2   xyz  3.200000
5      Comedy        1   abc  3.750000
6      Comedy        3   pqr  2.916667
9       Drama        4   ghi  2.250000
10    Fantasy        1   abc  3.750000
11    Fantasy        2   xyz  3.200000
12    Romance        3   pqr  2.916667
13    Romance        4   ghi  2.250000
  • Related