I currently have the following UsersData table which gives aggregated historical data at a given particular date:
Date | UserID | Name | isActive |
---|---|---|---|
2021-10-01 | 1 | Sam | 1 |
2021-10-01 | 2 | Dan | 1 |
2021-10-08 | 1 | Sam | 0 |
2021-10-08 | 2 | Dan | 1 |
Requirement
My requirement is to create another aggregated data which will show active vs inactive record for a the above given dates - in a single row. So something like below:
Date | Active | Inactive | Total |
---|---|---|---|
2021-10-01 | 2 | 0 | 2 |
2021-10-08 | 1 | 1 | 2 |
My SQL queries so far
Now when I try the following individual queries it works fine:
select date, count(distinct userid) AS ActiveCount from User where isActive= 1 group by date
select date, count(distinct userid) AS InactiveCount from User where isActive= 0 group by date
But since I need to display the statistics in a single row for each date, I tried the following query but there's something which I seem to be doing wrong here:
select
date,
(select count(distinct userid) from User where isActive= 1 group by date) AS Active,
(select count(distinct userid) from User where isActive= 0 group by date) AS Inactive,
count(distinct userid) AS total
from userdata
group by date
order by date
With this I get the output for inactive and active records as the sum of both results - Active = 3 (2 from first date 1 from second date) and 'Inactive' = 2 (0 from first date 1 from second date) Whereas 'TotalCount' value is accurate.
Here's the output which I get with the above query:
Date | Active | Inactive | Total |
---|---|---|---|
2021-10-01 | 3 | 1 | 2 |
2021-10-08 | 3 | 1 | 2 |
What am I doing wrong here? What would be the correct query? I'm executing these currently in Databricks Delta Lake SQL.
CodePudding user response:
I'm not sure about databricks, but I think this should work...
select date
, sum(case when isActive=1 then 1 else 0 end) AS ActiveCount
, sum(case when isActive=0 then 1 else 0 end) AS InactiveCount
, count(distinct userId) as total
from User
group by date
CodePudding user response:
select date
, count(distinct userid) filter (where isActive= 1) AS ActiveCount
, count(distinct userid) filter (where isActive= 0) AS InactiveCount
, count(distinct userid) Total
from User
group by date