Home > Mobile >  SQL get count of distinct user types in one row per date
SQL get count of distinct user types in one row per date

Time:12-23

I have a list of countries, and for each country I have a table like this (first column: user, second column: user_type, third column: date):

user user_type date
user1 National 2022-10-01
user1 National 2022-10-01
user2 National 2022-10-01
user2 International 2022-10-01
user3 National 2022-10-02
user1 Unknown 2022-10-02
user1 National 2022-10-03

I would like to get all the distinct users of each type, in one row per day, like this (4 columns: date, first_user_type, second_user_type, third_user_type):

date first_user_type second_user_type third_user_type
2022-10-01 2 1 0
2022-10-02 1 0 1
2022-10-03 1 0 0

However, If I run this query:

SELECT 
date, user_type, COUNT(distinct user) as num_users
FROM "country"."table" 
WHERE 
date between '2022-10-01' AND '2022-10-03' 
GROUP BY date, user_type 
ORDER BY date, user_type

Then I obtain the correct results but with the same date in different rows, like this:

date user_type num_users
2022-10-01 National 2
2022-10-01 International 1
2022-10-01 Unknown 0
2022-10-02 National 1
2022-10-02 International 0
2022-10-02 Unknown 1
2022-10-03 National 1
2022-10-03 International 0
2022-10-03 Unknown 0

But there is an additional difficulty: there are 3 user_types globally: "National", "International" and "Unknown", but the problem is that some countries only have "National", or "National" and "Unknown" (no "International"), and I would like the result to appear with a 0 (even if that user_type do not exist in that country). The query should be the same for all conutries, only changing "country".

(P.S: the query should be run into AWS Athena)

Any ideas?

CodePudding user response:

Conditional aggregation should do it here:

SELECT date_, 
       COUNT(DISTINCT IF(user_type = 'National'     , user_, NULL)) AS numNational,
       COUNT(DISTINCT IF(user_type = 'International', user_, NULL)) AS numInternational,
       COUNT(DISTINCT IF(user_type = 'Unknown'      , user_, NULL)) AS numUnknown
FROM tab
GROUP BY date_

Check the demo here.

CodePudding user response:

SELECT date,
       SUM(CASE WHEN user_type = 'National' THEN 1 ELSE 0 END) AS National,
       SUM(CASE WHEN user_type = 'International' THEN 1 ELSE 0 END) AS International,
       SUM(CASE WHEN user_type = 'Unknown' THEN 1 ELSE 0 END) AS Unknown
FROM "country"."table"
WHERE date BETWEEN '2022-10-01' AND '2022-10-03'
GROUP BY date

I hope this helps!

  • Related