Within a hospital encounter, a patient may be administered several different formulations of a medication, as shown here:
Encounter | Medication | Administration | Adm_Num |
---|---|---|---|
1 | A | 8/31/21 11:33 AM | 1 |
1 | B | 8/31/21 6:25 PM | 2 |
1 | C | 9/1/21 8:55 AM | 3 |
1 | D | 9/1/21 10:00 PM | 4 |
1 | B | 9/2/21 11:27 AM | 5 |
1 | B | 9/2/21 10:00 PM | 6 |
1 | B | 9/3/21 6:15 AM | 7 |
1 | B | 9/3/21 3:30 PM | 8 |
1 | D | 9/3/21 8:30 PM | 9 |
The task: For each encounter, I need to enumerate each formulation of the medication, like this:
Encounter | Medication | Administration | Adm_Num | Formulation |
---|---|---|---|---|
1 | A | 8/31/21 11:33 AM | 1 | 1 |
1 | B | 8/31/21 6:25 PM | 2 | 2 |
1 | C | 9/1/21 8:55 AM | 3 | 3 |
1 | D | 9/1/21 10:00 PM | 4 | 4 |
1 | B | 9/2/21 11:27 AM | 5 | 5 |
1 | B | 9/2/21 10:00 PM | 6 | 5 |
1 | B | 9/3/21 6:15 AM | 7 | 5 |
1 | B | 9/3/21 3:30 PM | 8 | 5 |
1 | D | 9/3/21 8:30 PM | 9 | 6 |
I need to preserve the administration-level granularity in order to report on those data (and I'm in good shape there).
One strategy I tried was a CTE to find a medication-level value for each medication, such as the first administration instant, then apply that to the more granular administration-level data, like this:
WITH f as (
SELECT a.Encounter, a.Medication, MIN(Administration) as First_Adm,
DENSE_RANK() OVER (PARTITION Encounter, Medication ORDER BY Administration) as Formulation
FROM a
GROUP BY a.Encounter, a.Medication
)
SELECT a.Encounter, a.Medication, a.Administration, a.Adm_Num, f.[First_Adm], f.Formulation
FROM a
INNER JOIN f ON (a.Encounter = f.Encounter AND a.Medication = f.Medication)
GROUP BY a.Encounter, a.Medication, a.Administration
ORDER BY a.Encounter, a.Medication, a.Administration
That worked well for the first few medications with single administrations, but it mishandled the reappearance of Medications B and D later in the encounter; it recognized Medication B from before, gave it the earlier MIN(Administration), and mislabeled it Formulation 2, as shown here:
Encounter | Medication | Administration | Adm_Num | First_Adm | Formulation |
---|---|---|---|---|---|
1 | A | 8/31/21 11:33 AM | 1 | 8/31/21 11:33 AM | 1 |
1 | B | 8/31/21 6:25 PM | 2 | 8/31/21 6:25 PM | 2 |
1 | C | 9/1/21 8:55 AM | 3 | 9/1/21 8:55 AM | 3 |
1 | D | 9/1/21 10:00 PM | 4 | 9/1/21 10:00 PM | 4 |
1 | B | 9/2/21 11:27 AM | 5 | 8/31/21 6:25 PM | 2 |
1 | B | 9/2/21 10:00 PM | 6 | 8/31/21 6:25 PM | 2 |
1 | B | 9/3/21 6:15 AM | 7 | 8/31/21 6:25 PM | 2 |
1 | B | 9/3/21 3:30 PM | 8 | 8/31/21 6:25 PM | 2 |
1 | D | 9/3/21 8:30 PM | 9 | 9/1/21 10:00 PM | 4 |
This is close, but not quite there. The remaining challenge is to account for repeated formulations.
How can I group the administration-level data by medication formulation groups, order the formulations chronologically (making use of the time data), and assign a sequential value to each formulation, even if some are repeated?
Thanks in advance for your help.
CodePudding user response:
This is a type of "gaps and islands" problem.
One method is to first use lag or lead to check the adjacent row to identify where the data (Medication) changes, followed by summing these values for all preceding rows for each row in the results to generate the desired sequence:
with c as (
select *,
case when
Lag(medication) over (partition by encounter order by Administration) = medication
then 0 else 1 end Changed
from t
)
select Encounter, Medication, Administration, Adm_Num,
Sum(changed) over(partition by Encounter order by Administration) Formulation
from c;