I have a list of countries, and for each country I have a table like this (first column: user
, second column: user_type
, third column: date
):
user | user_type | date |
---|---|---|
user1 | National | 2022-10-01 |
user1 | National | 2022-10-01 |
user2 | National | 2022-10-01 |
user2 | International | 2022-10-01 |
user3 | National | 2022-10-02 |
user1 | Unknown | 2022-10-02 |
user1 | National | 2022-10-03 |
I would like to get all the distinct users of each type, in one row per day, like this (4 columns: date, first_user_type, second_user_type, third_user_type):
date | first_user_type | second_user_type | third_user_type |
---|---|---|---|
2022-10-01 | 2 | 1 | 0 |
2022-10-02 | 1 | 0 | 1 |
2022-10-03 | 1 | 0 | 0 |
However, If I run this query:
SELECT
date, user_type, COUNT(distinct user) as num_users
FROM "country"."table"
WHERE
date between '2022-10-01' AND '2022-10-03'
GROUP BY date, user_type
ORDER BY date, user_type
Then I obtain the correct results but with the same date in different rows, like this:
date | user_type | num_users |
---|---|---|
2022-10-01 | National | 2 |
2022-10-01 | International | 1 |
2022-10-01 | Unknown | 0 |
2022-10-02 | National | 1 |
2022-10-02 | International | 0 |
2022-10-02 | Unknown | 1 |
2022-10-03 | National | 1 |
2022-10-03 | International | 0 |
2022-10-03 | Unknown | 0 |
But there is an additional difficulty: there are 3 user_types
globally: "National", "International" and "Unknown", but the problem is that some countries only have "National", or "National" and "Unknown" (no "International"), and I would like the result to appear with a 0 (even if that user_type
do not exist in that country).
The query should be the same for all conutries, only changing "country
".
(P.S: the query should be run into AWS Athena)
Any ideas?
CodePudding user response:
Conditional aggregation should do it here:
SELECT date_,
COUNT(DISTINCT IF(user_type = 'National' , user_, NULL)) AS numNational,
COUNT(DISTINCT IF(user_type = 'International', user_, NULL)) AS numInternational,
COUNT(DISTINCT IF(user_type = 'Unknown' , user_, NULL)) AS numUnknown
FROM tab
GROUP BY date_
Check the demo here.
CodePudding user response:
SELECT date,
SUM(CASE WHEN user_type = 'National' THEN 1 ELSE 0 END) AS National,
SUM(CASE WHEN user_type = 'International' THEN 1 ELSE 0 END) AS International,
SUM(CASE WHEN user_type = 'Unknown' THEN 1 ELSE 0 END) AS Unknown
FROM "country"."table"
WHERE date BETWEEN '2022-10-01' AND '2022-10-03'
GROUP BY date
I hope this helps!