I have been practising SQL, and came across this behaviour i couldnt explain. ( I am also the one who asked this question : Over() function does not cover all rows in the table) -> its a different problem.
Suppose i have a table like this
MovieRating table:
movie_id | user_id | rating | created_at |
---|---|---|---|
1 | 1 | 3 | 2020-01-12 |
1 | 2 | 4 | 2020-02-11 |
1 | 3 | 2 | 2020-02-12 |
1 | 4 | 1 | 2020-01-01 |
2 | 1 | 5 | 2020-02-17 |
2 | 2 | 2 | 2020-02-01 |
2 | 3 | 2 | 2020-03-01 |
3 | 1 | 3 | 2020-02-22 |
3 | 2 | 4 | 2020-02-25 |
What I am trying to do, is to rank the movie by rating, which i have this SQL query:
SELECT
movie_id,
rank() over(partition by movie_id order by avg(rating) desc) as rank_rate
FROM
MovieRating
From my previous question, i learnt that the over() function will operate in a window selected by the query, basically the window this query returns:
SELECT movie_id FROM MovieRating
So I would expect to see at least 3 rows here, for id 1, 2 and 3.
The result is however just one row:
{"headers": ["movie_id", "rank_rate"], "values": [[1, 1]]}
Why is that ? Is something wrong with my understanding regarding how over()
function works ?
CodePudding user response:
You need an aggregation query and use RANK()
window function on its results:
SELECT movie_id,
AVG(rating) AS average_rating, -- you may remove this line if you don't actually need the average rating
RANK() OVER (ORDER BY AVG(rating) DESC) AS rank_rate
FROM MovieRating
GROUP BY movie_id
ORDER BY rank_rate;
See the demo.
Your query is an aggregation query without a group by
clause and this means that it operates on the whole table and not to each movie_id
. Such queries return only 1 row with the result of the aggregation.
When yo apply RANK()
window function, it will operate on that single row and not on the table.
CodePudding user response:
I think you mean to get one row for each movie, with its average rating.
You should use GROUP BY
, not a window function:
SELECT movie_id, AVG(rating) AS avg_rating
FROM MovieRating
GROUP BY movie_id
ORDER BY avg_rating DESC;
https://www.db-fiddle.com/f/o9qLFbJEwhaHDWoTS9Qfwp/1
The reason you only got one row is that when you use an aggregate function like AVG()
, that implicitly makes the query into an aggregating query. The result of the query is one row per group.
https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html says:
If you use an aggregate function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
In other words, the whole table is considered one "group" if you use AVG()
but don't specify a GROUP BY
expression. Because the whole table is a single group, the result is one row.
Windows defined by windowing functions are not the same as groups defined by aggregate functions. The window functions are applied after the rows have been reduced by aggregation. Since there was only one group and therefore one row in your result, the rank was 1.