In PostgreSQL, conditionally count rows-CodePudding

Background

I'm a novice Postgres user running a local server on a Windows 10 machine. I've got a dataset g that looks like this:

 -- --------- ---------------- 
|id|treatment|outcome_category|
 -- --------- ---------------- 
|a |1        |cardiovascular  |
|a |0        |cardiovascular  |
|b |0        |metabolic       |
|b |0        |sensory         |
|c |1        |NULL            |
|c |0        |cardiovascular  |
|c |1        |sensory         |
|d |1        |NULL             |
|d |0        |cns            |
 -- --------- ----------------

The Problem

I'd like to get a count of outcome_category by outcome_category for those id who are "ever treated" -- defined as "id's who have any row where treatment=1".

Here's the desired result:

 ---------------- --------- 
|outcome_category|  count  |
 ---------------- --------- 
|cardiovascular  |    3    |
|sensory         |    1    |
|cns             |    1    |
 ---------------- ---------

It would be fine if the result had to contain metabolic, like so:

 ---------------- --------- 
|outcome_category|treatment|
 ---------------- --------- 
|cardiovascular  |    3    |
|metabolic       |    0    |
|sensory         |    1    |
|cns             |    1    |
 ---------------- ---------

Obviously I don't need the rows to be in any particular order, though descending would be nice.

What I've tried

Here's a query I've written:

select treatment, outcome_category, sum(outcome_ct)
from (select max(treatment) as treatment,
    outcome_category,
    count(outcome_category) as outcome_ct
    from g
    group by outcome_category) as sub
group by outcome_category, sub.treatment;

But it's a mishmash result:

 --------- ---------------- --- 
|treatment|outcome_category|sum|
 --------- ---------------- --- 
|1        |cardiovascular  |3  |
|1        |sensory         |2  |
|0        |metabolic       |1  |
|1        |NULL            |0  |
|0        |cns             |1  |
 --------- ---------------- ---

I'm trying to identify the "ever exposed" id's using that first line in the subquery: select max(treatment) as treatment. But I'm not quite getting at the rest of it.

EDIT

I realized that the toy dataset g I originally gave you above doesn't correspond to the idiosyncrasies of my real dataset. I've updated g to reflect that many id's who are "ever treated" won't have a non-null outcome_category next to a row with treatment=1.

CodePudding user response：

This would appear to be just a simple aggregation,

select outcome_category, Count(*) count
from t
where treatment=1
group by outcome_category
order by Count(*) desc

Demo fiddle

CodePudding user response：

Interesting little problem. You can do:

select
  outcome_category,
  count(x.id) as count
from g
left join (
  select distinct id from g where treatment = 1
) x on x.id = g.id
where outcome_category is not null
group by outcome_category
order by count desc

Result:

 outcome_category  count 
 ----------------- ----- 
 cardiovascular    3     
 sensory           1     
 cns               1     
 metabolic         0

See running example at db<>fiddle.