I want to count the distinct amount of users over the last 60 days, and then, count the distinct amount of users over the last 59 days, and so on and so forth.
Ideally, the output would look like this (TARGET OUTPUT)
Day Distinct Users
60 200
59 200
58 188
57 185
56 180
[...] [...]
where 60 days is the max total possible distinct users, and then 59 would have a little less and so on and so forth.
my query looks like this.
select
count(distinct (case when datediff(day,DATE,current_date) <= 60 then USER_ID end)) as day_60,
count(distinct (case when datediff(day,DATE,current_date) <= 59 then USER_ID end)) as day_59,
count(distinct (case when datediff(day,DATE,current_date) <= 58 then USER_ID end)) as day_58
FROM Table
The issue with my query is that This outputs the data by column instead of by rows (like shown below) AND, most importantly, I have to write out this logic 60x for each of the 60 days.
Current Output:
Day_60 Day_59 Day_58
209 207 207
Is it possible to write the SQL in a way that creates the target as shown initially above?
CodePudding user response:
You can do unpivot after getting your current output. sample one. select * from ( select 209 Day_60, 207 Day_59, 207 Day_58 )unpivot ( cnt for days in (Day_60,Day_59,Day_58));
CodePudding user response:
Using below data in CTE format -
with data_cte(dates,userid) as
(select * from values
('2022-05-01'::date,'UID1'),
('2022-05-01'::date,'UID2'),
('2022-05-02'::date,'UID1'),
('2022-05-02'::date,'UID2'),
('2022-05-03'::date,'UID1'),
('2022-05-03'::date,'UID2'),
('2022-05-03'::date,'UID3'),
('2022-05-04'::date,'UID1'),
('2022-05-04'::date,'UID1'),
('2022-05-04'::date,'UID2'),
('2022-05-04'::date,'UID3'),
('2022-05-04'::date,'UID4'),
('2022-05-05'::date,'UID1'),
('2022-05-06'::date,'UID1'),
('2022-05-07'::date,'UID1'),
('2022-05-07'::date,'UID2'),
('2022-05-08'::date,'UID1')
)
Query to get all dates and count and distinct counts -
select dates,count(userid) cnt, count(distinct userid) cnt_d
from data_cte
group by dates;
DATES | CNT | CNT_D |
---|---|---|
2022-05-01 | 2 | 2 |
2022-05-02 | 2 | 2 |
2022-05-03 | 3 | 3 |
2022-05-04 | 5 | 4 |
2022-05-05 | 1 | 1 |
2022-05-06 | 1 | 1 |
2022-05-08 | 1 | 1 |
2022-05-07 | 2 | 2 |
Query to get difference of date from current date
select dates,datediff(day,dates,current_date()) ddiff,
count(userid) cnt,
count(distinct userid) cnt_d
from data_cte
group by dates;
DATES | DDIFF | CNT | CNT_D |
---|---|---|---|
2022-05-01 | 45 | 2 | 2 |
2022-05-02 | 44 | 2 | 2 |
2022-05-03 | 43 | 3 | 3 |
2022-05-04 | 42 | 5 | 4 |
2022-05-05 | 41 | 1 | 1 |
2022-05-06 | 40 | 1 | 1 |
2022-05-08 | 38 | 1 | 1 |
2022-05-07 | 39 | 2 | 2 |
Get records with date difference beyond a certain range only -
include clause having
select datediff(day,dates,current_date()) ddiff,
count(userid) cnt,
count(distinct userid) cnt_d
from data_cte
group by dates
having ddiff<=43;
DDIFF | CNT | CNT_D |
---|---|---|
43 | 3 | 3 |
42 | 5 | 4 |
41 | 1 | 1 |
39 | 2 | 2 |
38 | 1 | 1 |
40 | 1 | 1 |
If you need to prefix 'day' to each date diff count, you can add and outer query to previously fetched data-set and add the needed prefix to the date diff column as following -
I am using CTE syntax, but you may use sub-query given you will select from table -
,cte_1 as (
select datediff(day,dates,current_date()) ddiff,
count(userid) cnt,
count(distinct userid) cnt_d
from data_cte
group by dates
having ddiff<=43)
select 'day_'||to_char(ddiff) days,
cnt,
cnt_d
from cte_1;
DAYS | CNT | CNT_D |
---|---|---|
day_43 | 3 | 3 |
day_42 | 5 | 4 |
day_41 | 1 | 1 |
day_39 | 2 | 2 |
day_38 | 1 | 1 |
day_40 | 1 | 1 |