I have 5 conditions (A-E) and a bunch of patient ID's. My data set is 2 columns: PatientID, Condition.
There are duplicate PatientID's with every new condition:
PatientID | Condition |
---|---|
456 | C |
456 | E |
279 | D |
123 | A |
123 | C |
123 | D |
187 | D |
296 | E |
296 | C |
I believe there are 31 different potential combinations (order doesn't matter) of those 5 conditions (ie A, AB, ABC, AC, ACDE, etc)
I want to count how many patients in each combination of conditions. So my results for the above would be - CE: 2 D: 2 ACD: 1
I'm more familiar with Excel but if this is better handled in SQL, I can do it there. I think I need to create a table of all the different combinations (any help on that would be appreciated too) and then do a count from there but I'm not sure if that's the best way.
CodePudding user response:
SQL Server solution
SELECT
conditions,
COUNT(*)
FROM (
SELECT
patientid,
STRING_AGG(condition, '') WITHIN GROUP (ORDER BY condition) conditions
FROM tbl
GROUP BY patientid
) c
GROUP BY conditions
Output
conditions | patient_count |
---|---|
ACD | 1 |
CE | 2 |
D | 2 |