Background
I'm a novice Postgres user running a local server on a Windows 10 machine. I've got a dataset g
that looks like this:
-- --------- ----------------
|id|treatment|outcome_category|
-- --------- ----------------
|a |1 |cardiovascular |
|a |0 |cardiovascular |
|b |0 |metabolic |
|b |0 |sensory |
|c |1 |NULL |
|c |0 |cardiovascular |
|c |1 |sensory |
|d |1 |NULL |
|d |0 |cns |
-- --------- ----------------
The Problem
I'd like to get a count of outcome_category
by outcome_category
for those id
who are "ever treated" -- defined as "id
's who have any row where treatment
=1".
Here's the desired result:
---------------- ---------
|outcome_category| count |
---------------- ---------
|cardiovascular | 3 |
|sensory | 1 |
|cns | 1 |
---------------- ---------
It would be fine if the result had to contain metabolic
, like so:
---------------- ---------
|outcome_category|treatment|
---------------- ---------
|cardiovascular | 3 |
|metabolic | 0 |
|sensory | 1 |
|cns | 1 |
---------------- ---------
Obviously I don't need the rows to be in any particular order, though descending would be nice.
What I've tried
Here's a query I've written:
select treatment, outcome_category, sum(outcome_ct)
from (select max(treatment) as treatment,
outcome_category,
count(outcome_category) as outcome_ct
from g
group by outcome_category) as sub
group by outcome_category, sub.treatment;
But it's a mishmash result:
--------- ---------------- ---
|treatment|outcome_category|sum|
--------- ---------------- ---
|1 |cardiovascular |3 |
|1 |sensory |2 |
|0 |metabolic |1 |
|1 |NULL |0 |
|0 |cns |1 |
--------- ---------------- ---
I'm trying to identify the "ever exposed" id
's using that first line in the subquery: select max(treatment) as treatment
. But I'm not quite getting at the rest of it.
EDIT
I realized that the toy dataset g
I originally gave you above doesn't correspond to the idiosyncrasies of my real dataset. I've updated g
to reflect that many id
's who are "ever treated" won't have a non-null outcome_category
next to a row with treatment
=1.
CodePudding user response:
This would appear to be just a simple aggregation,
select outcome_category, Count(*) count
from t
where treatment=1
group by outcome_category
order by Count(*) desc
CodePudding user response:
Interesting little problem. You can do:
select
outcome_category,
count(x.id) as count
from g
left join (
select distinct id from g where treatment = 1
) x on x.id = g.id
where outcome_category is not null
group by outcome_category
order by count desc
Result:
outcome_category count
----------------- -----
cardiovascular 3
sensory 1
cns 1
metabolic 0
See running example at db<>fiddle.