Background
I'm a novice SQL user. Using PostgreSQL 13 on Windows 10 locally, I have a table t
:
-- --------- -------
|id|treatment|outcome|
-- --------- -------
|a |1 |0 |
|a |1 |1 |
|b |0 |1 |
|c |1 |0 |
|c |0 |1 |
|c |1 |1 |
-- --------- -------
The Problem
I didn't explain myself well initially, so I've rewritten the goal.
Desired result:
----------------------- -----
|ever treated |count|
----------------------- -----
|0 |1 |
|1 |3 |
----------------------- -----
First, identify id
that have ever been treated. Being "ever treated" means having any row with treatment = 1
.
Second, count rows with outcome = 1
for each of those two groups. From my original table, the id
s who are "ever treated" have a total of 3 outcome
= 1, and the "never treated", so to speak, have 1 `outcome = 1.
What I've tried
I can get much of the way there, I think, with something like this:
select treatment, count(outcome)
from t
group by treatment;
But that only gets me this result:
--------- -----
|treatment|count|
--------- -----
|0 |2 |
|1 |4 |
--------- -----
CodePudding user response:
For the updated question:
SELECT ever_treated, sum(outcome_ct) AS count
FROM (
SELECT id
, max(treatment) AS ever_treated
, count(*) FILTER (WHERE outcome = 1) AS outcome_ct
FROM t
GROUP BY 1
) sub
GROUP BY 1;
ever_treated | count
-------------- -------
0 | 1
1 | 3
db<>fiddle here
Read:
- For those who got no treatment at all (all
treatment = 0
), we see1
xoutcome = 1
. - For those who got any treatment (at least one
treatment = 1
), we see3
xoutcome = 1
.
Would be simpler and faster with proper boolean
values instead of integer
.
CodePudding user response:
here is an easy to follow subquery logic that works with integer:
select subq.treatment, count(subq.outcome)
from (select * from t where outcome = 1 and
(id in (select distinct(id) from t where treatment = 1))) as subq
group by subq.treatment ;