I have a dataset, where I need to find top 2 average rated movies for each genre. With the given layout of data, how can I achieve it? Can someone please help me understand a workaround?
In order to simplify it, I have created new columns which separates values from 'genres' into unique genres and assigned values 1 and 0 to it. So, the group by will be based on these new unique genres and not the 'genres' column. For example, if I filter on the column 'Comedy' to 1, the average rating for title 'abc', 'pqr','ghi', and 'mno' are 3.75, 2.9, 2.25, and 3.12 respectively. So, the top 2 movies based on the average rating for Comedy genre will be abc and mno. Similar logic will be followed for other unique genres.
My original dataset is very similar to the sample dataset below:
| movieId | title | genres | userId | rating | release_year | review_year | Children | Action | Mystery | Sci-Fi | (no genres listed) | Drama | Fantasy | Film-Noir | Romance | Thriller | Western | Comedy | Documentary | Crime | War | Horror | Animation | IMAX | Adventure | Musical |
|:-------:|:-----:|:-----------------------------------------------:|:------:|:------:|--------------|:-----------:|:--------:|:------:|:-------:|:------:|:--------------------:|:-----:|:-------:|:---------:|:-------:|:--------:|:-------:|:------:|:-----------:|:-----:|:---:|:------:|:---------:|:----:|:---------:|:-------:|
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 3437 | 3.5 | 1995 | 2009 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 7107 | 3 | 1995 | 2014 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 17211 | 4 | 1995 | 2010 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 18056 | 3 | 1995 | 2013 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 22223 | 5 | 1995 | 2014 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 28268 | 3.5 | 1995 | 2016 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 32984 | 4.5 | 1995 | 2013 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 33260 | 0.5 | 1995 | 2014 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 35223 | 3.5 | 1995 | 2009 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 54852 | 4.5 | 1995 | 2014 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 56556 | 4 | 1995 | 2016 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 59622 | 5 | 1995 | 2017 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 78317 | 4.5 | 1995 | 2011 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 1 | abc | Adventure__Animation__Children__Comedy__Fantasy | 80922 | 4 | 1995 | 2005 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 2 | xyz | Adventure__Children__Fantasy | 8667 | 3 | 2000 | 2010 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 2 | xyz | Adventure__Children__Fantasy | 16458 | 2 | 2000 | 2005 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 2 | xyz | Adventure__Children__Fantasy | 16464 | 3.5 | 2000 | 2011 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 2 | xyz | Adventure__Children__Fantasy | 19786 | 4 | 2000 | 2005 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 2 | xyz | Adventure__Children__Fantasy | 30494 | 3.5 | 2000 | 2005 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 3 | pqr | Comedy__Romance | 204961 | 4 | 2008 | 2016 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | pqr | Comedy__Romance | 220587 | 3.5 | 2008 | 2016 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | pqr | Comedy__Romance | 224358 | 3 | 2008 | 2014 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | pqr | Comedy__Romance | 231990 | 2.5 | 2008 | 2005 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | pqr | Comedy__Romance | 234484 | 3 | 2008 | 2016 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | pqr | Comedy__Romance | 251794 | 1.5 | 2008 | 2011 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | ghi | Comedy__Drama__Romance | 4294 | 3 | 2012 | 2016 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | ghi | Comedy__Drama__Romance | 81593 | 2.5 | 2012 | 2009 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | ghi | Comedy__Drama__Romance | 114192 | 2.5 | 2012 | 2013 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | ghi | Comedy__Drama__Romance | 178767 | 0.5 | 2012 | 2014 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | ghi | Comedy__Drama__Romance | 212863 | 2 | 2012 | 2005 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | ghi | Comedy__Drama__Romance | 251932 | 3 | 2012 | 2005 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | mno | Comedy | 16481 | 4 | 2015 | 2016 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | mno | Comedy | 25066 | 4 | 2015 | 2014 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | mno | Comedy | 27833 | 3 | 2015 | 2005 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | mno | Comedy | 31864 | 1.5 | 2015 | 2014 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
as dataframe:
pd.DataFrame({'movieId': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5],
'title': ['abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','abc','xyz','xyz','xyz','xyz','xyz','pqr','pqr','pqr','pqr','pqr','pqr','ghi','ghi','ghi','ghi','ghi','ghi','mno','mno','mno','mno'],
'genres': ['Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Animation__Children__Comedy__Fantasy','Adventure__Children__Fantasy','Adventure__Children__Fantasy','Adventure__Children__Fantasy','Adventure__Children__Fantasy','Adventure__Children__Fantasy','Comedy__Romance','Comedy__Romance','Comedy__Romance','Comedy__Romance','Comedy__Romance','Comedy__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy__Drama__Romance','Comedy','Comedy','Comedy','Comedy'],
'userId': [3437,7107,17211,18056,22223,28268,32984,33260,35223,54852,56556,59622,78317,80922,8667,16458,16464,19786,30494,204961,220587,224358,231990,234484,251794,4294,81593,114192,178767,212863,251932,16481,25066,27833,31864],
'rating': [3.5,3.0,4.0,3.0,5.0,3.5,4.5,0.5,3.5,4.5,4.0,5.0,4.5,4.0,3.0,2.0,3.5,4.0,3.5,4.0,3.5,3.0,2.5,3.0,1.5,3.0,2.5,2.5,0.5,2.0,3.0,4.0,4.0,3.0,1.5],
'release_year': [1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,1995,2000,2000,2000,2000,2000,2008,2008,2008,2008,2008,2008,2012,2012,2012,2012,2012,2012,2015,2015,2015,2015],
'review_year': [2009,2014,2010,2013,2014,2016,2013,2014,2009,2014,2016,2017,2011,2005,2010,2005,2011,2005,2005,2016,2016,2014,2005,2016,2011,2016,2009,2013,2014,2005,2005,2016,2014,2005,2014],
'Children': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Action': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Mystery': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Sci-Fi': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'(no genres listed)': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Drama': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0],
'Fantasy': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Film-Noir': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Romance': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0],
'Thriller': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Western': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Comedy': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
'Documentary': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Crime': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'War': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Horror': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Animation': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'IMAX': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Adventure': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
'Musical': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]})
My output would be similar to the one below:
| Genre | title | avg rating |
|-----------|-------|------------|
| Children | abc | 3.75 |
| Drama | ghi | 2.25 |
| Fantasy | abc | 3.75 |
| Fantasy | xyz | 3.2 |
| Romance | pqr | 2.9 |
| Romance | ghi | 2.25 |
| Comedy | abc | 3.75 |
| Comedy | mno | 3.12 |
| Animation | abc | 3.75 |
| Adventure | abc | 3.75 |
| Adventure | xyz | 3.2 |
Please note that this is a sample dataset and there are many unique genres whose value is 0. So, such unique genres are not present in the column 'genre' of the final output. Also, some of these unique genres have only one movie falling in their category, so they have only one listing of those movies.
CodePudding user response:
You want to get the top 2 movies per individual genre, you first need to melt
the dummy variables, then groupby
agg
:
(df.melt(id_vars=df.columns[:7], var_name='genre')
.query('value == 1')
.groupby(['genre', 'movieId'], as_index=False).agg({'title': 'first', 'rating': 'mean'})
.groupby('genre').head(2)
)
output:
genre movieId title rating
0 Adventure 1 abc 3.750000
1 Adventure 2 xyz 3.200000
2 Animation 1 abc 3.750000
3 Children 1 abc 3.750000
4 Children 2 xyz 3.200000
5 Comedy 1 abc 3.750000
6 Comedy 3 pqr 2.916667
9 Drama 4 ghi 2.250000
10 Fantasy 1 abc 3.750000
11 Fantasy 2 xyz 3.200000
12 Romance 3 pqr 2.916667
13 Romance 4 ghi 2.250000