I have a table Movie with columns Movie
and Viewer
where each movie is viewed by any user any number of times, so the table can have multiple same entries. I want to find the Top N most viewed movies and then the Top K viewers for each of the Top N movies. How can I apply group by or partition by effectively in such scenario ? Or if there is any better approach to this, please share. Thanks!
Movie | User |
---|---|
Avengers | John |
Batman | Chris |
Batman | Ron |
X-Men | Chris |
X-Men | Ron |
Matrix | John |
Batman | Martin |
Matrix | Chris |
Batman | Chris |
X-Men | Ron |
So, in this table Batman is the most watched movie is Batman followed by X-Men so I want the result table to look like :
Movie | User | View count |
---|---|---|
Batman | Chris | 2 |
Batman | Ron | 1 |
Batman | Martin | 1 |
X-Men | Ron | 2 |
X-Men | Chris | 1 |
Matrix | John | 1 |
Matrix | Chris | 1 |
Avengers | John | 1 |
I understand that I can group by movie and then do order by count(*) desc but this doesn't give me the second column which is grouped by viewer and the count for each viewer also.
CodePudding user response:
Consider below approach (assuming Top 3 movies with Top 2 users)
select movie, user, view_count
from (
select distinct *,
count(*) over(partition by movie) movie_views,
count(*) over(partition by movie, user) view_count
from your_table
)
qualify dense_rank() over(order by movie_views desc) <=3
and row_number() over(partition by movie order by view_count desc) <=2
-- order by movie_views desc, view_count desc
if applied to sample data in your question - output is