SELECT user_id AS buyer_id, join_date,
CASE
WHEN YEAR(order_date) = 2019 THEN COUNT(order_date)
ELSE 0
END AS orders_in_2019
FROM Users U LEFT JOIN Orders O ON U.user_id = O.buyer_id
GROUP BY user_id;
I am facing problem with orders_in_2019 column for user_id no. 1. It's count should be 1 as per the solution but i am getting the output as 0. I have seen some solutions where adding
another condition like AND YEAR(order_date) = 2019
while joining solves the problem but why it is so? Please help me with this doubt.
Thanks
Link to the problem : https://leetcode.com/problems/market-analysis-i/
CodePudding user response:
The COUNT is an aggregate function, and you are using it incorrectly. Change your query a little to:
SELECT user_id AS buyer_id, join_date,
SUM(CASE
WHEN YEAR(order_date) = 2019 THEN 1
ELSE 0
END) AS orders_in_2019
FROM Users U LEFT JOIN Orders O ON U.user_id = O.buyer_id
GROUP BY user_id;
CodePudding user response:
The issue is the placement of the aggregate COUNT function in a scalar context, inside the CASE expression.
The value of the aggregate will not be available until after the GROUP BY operation, and the query is going to be looking at a value of order_date
column from just one of the joined rows. Some of those may be 2019 dates, and some may not. That's why adding a condition to the JOIN condition, so only 2019 order dates are returned seems to work.
For the Conditional Aggregation pattern, do the aggregation outside of the conditional test.
We want a conditional test on each individual row, something like this
CASE WHEN some_condition_is_true THEN 1 ELSE 0 END
if the condition is true, return 1 else return 0. First write the query without the aggregate, to see what is being returned,
SELECT u.user_id AS buyer_id
, u.join_date
, o.order_date
, YEAR(o.order_date)
, CASE WHEN YEAR(o.order_date) = 2019 THEN 1 ELSE 0 END AS order_in_2019
, CASE WHEN YEAR(o.order_date) = 2020 THEN 1 ELSE 0 END AS order_in_2020
FROM `Users` u
LEFT
JOIN `Orders` o
ON u.user_id = o.buyer_id
ORDER
BY u.user_id
, u.join_date
, o.order_date
The CASE expressions are going to return either a 0 or 1 for each row returned, which is what we want.
We can now tweak that query to add a GROUP BY
clause to collapse the rows, and we can wrap the CASE expressions in SUM aggregate to add up the 1s and 0s to get a count.
Conditional aggregation makes sense when we are wanting to return multiple counts, if we filter out all rows from Orders that arent in 2019 (adding a condition to the JOIN), then we cant get counts for other years,
Demonstration of conditional aggregation pattern
SELECT u.user_id AS buyer_id
, u.join_date
, SUM( CASE WHEN YEAR(o.order_date) = 2019 THEN 1 ELSE 0 END ) AS orders_in_2019
, SUM( CASE WHEN YEAR(o.order_date) = 2020 THEN 1 ELSE 0 END ) AS orders_in_2020
FROM `Users` u
LEFT
JOIN `Orders` o
ON u.user_id = o.buyer_id
GROUP
BY u.user_id
ORDER
BY u.user_id