I have a table that contains 3 different columns:
- itemid,
- caseid and
- userid.
I would like to count two different types of scenarios:
- If there for the same combination of itemid, caseid, count(distinct userid) = 1, count this total number.
- If for the same combination of itemid, caseid, count(distinct userid) > 1, count this number.
How can I formulate a query that does this in one query?
I've tried something like this, but this doesn't work.
SELECT CASE WHEN eventid, caseid, count (distinct userid) =1 AS total_unique THEN count(totalunique) END AS total_unique,
CASE WHEN eventid, caseid, count (distinct userid) > 1 AS total_multiple then count(totalmultiple) END AS total_multiple
FROM TBL_1
CodePudding user response:
with data as (
select eventid, caseid, count(distinct userid) as cnt_user
from T
group by eventid, caseid
)
select
count(case when cnt_user = 1 then 1 end) as cnt_unique,
count(case when cnt_user > 1 then 1 end) as cnt_multiple
from data;
This can be combined into a shorter query with distinct
sum() over
:
select distinct
count(case when count(distinct userid) = 1 then 1 end) over () as total_unique,
count(case when count(distinct userid) > 1 then 1 end) over () as total_multiple
from T
group by eventid, caseid;
CodePudding user response:
With this CTE just for simplicity of testing:
WITH fake_data(itemid, caseid, userid) AS (
SELECT * FROM VALUES
/* we will have three (itemid,caseid)'s with a count of =1*/
(1,10,100),
(1,11,100),
(1,12,100),
/* we will have two (itemid,caseid)'s with a count of >1*/
(2,11,101),
(2,11,102),
(2,12,101),
(2,12,102)
)
we want to see 3 counts of ones at the end, and 2 counts of > one.
a building block step is to count those sets:
SELECT
itemid,
caseid,
count(distinct userid) as cd_userid
FROM fake_data
GROUP BY 1,2;
which gives:
ITEMID | CASEID | CD_USERID |
---|---|---|
1 | 10 | 1 |
1 | 11 | 1 |
1 | 12 | 1 |
2 | 11 | 2 |
2 | 12 | 2 |
and now to count the counts. This needs to be done (in this method) as another layer as the context of "what is being grouped over" needs to be expressed. You did mention "in one expression" which I think you mean as not a nest sub-select.
SELECT
count_if(cd_userid = 1) as count_of_ones,
count_if(cd_userid > 1) as count_of_gt_ones,
count(*) as count_of_totals,
count_of_totals - count_of_ones as count_of_gt_ones_b
FROM (
SELECT
itemid,
caseid,
count(distinct userid) as cd_userid
FROM fake_data
GROUP BY 1,2
);
COUNT_OF_ONES | COUNT_OF_GT_ONES | COUNT_OF_TOTALS | COUNT_OF_GT_ONES_B |
---|---|---|---|
3 | 2 | 5 | 2 |
this shows there are a couple of ways to get the both counts as they are related to each other.
So two mash this into one layer of query, which is really the same as the two layers and thus possible less readable, and might depending how you merge this with a larger block of SQL less more/less performant, and very similar to Shawn's answer, but skipping the CASE/IFF via a COUNT_IF, and again showing there are two ways to get the second answer (depending how obtuse you wish to be)
select distinct
count_if(count(distinct userid) = 1) over() as count_of_ones,
count_if(count(distinct userid) > 1) over() as count_of_gt_ones,
count(distinct itemid, caseid) over() - count_of_ones as count_of_gt_ones_b
from fake_data
group by itemid, caseid;
gives:
COUNT_OF_ONES | COUNT_OF_GT_ONES | COUNT_OF_GT_ONES_B |
---|---|---|
3 | 2 | 2 |