My current table is like the below, each patient has their visit start date and end date to a hospital, and they are administered a drug between admin_startdate and admin_enddate. For example, the first two rows mean, patient PT1 has two drug administrations, one between 01/08 and 01/10 & the other between 01/12 and 01/23, during her visit from 01/01 to 01/31.
ptid visit_start_date visit_end_date admin_startdate admin_enddate
PT1 2018-01-01 2018-01-31 2018-01-08 2018-01-10
PT1 2018-01-01 2018-01-31 2018-01-12 2018-01-23
PT2 2018-01-02 2018-01-18 2018-01-06 2018-01-11
PT2 2018-01-02 2018-01-18 2018-01-14 2018-01-17
What I would like to achieve is to lump together the drug administration that are too close together, say, the end date of the previous one is <= 2 days
of the start date of new one, and call that a whole episode, like below:
ptid visit_start_date visit_end_date admin_startdate admin_enddate episode_startdate episode_enddate
PT1 2018-01-01 2018-01-31 2018-01-08 2018-01-10 2018-01-08 2018-01-23
PT1 2018-01-01 2018-01-31 2018-01-12 2018-01-23 2018-01-08 2018-01-23
PT2 2018-01-02 2018-01-18 2018-01-06 2018-01-11 2018-01-06 2018-01-11
PT2 2018-01-02 2018-01-18 2018-01-14 2018-01-17 2018-01-14 2018-01-17
You can see that PT1's two administrations are lumped together with the same episode_startdate
and episode_enddate
, whereas PT2's two administrations are considered two separate episode.
I have a hard time to figure out how to do it in PostgreSQL (Redshift).
CodePudding user response:
This work in Postgres 14. Not tested for Redshift.
SELECT ptid, visit_start_date, visit_end_date, admin_startdate, admin_enddate
, min(admin_startdate) OVER (PARTITION BY visit_id, admin) AS episode_startdate
, max(admin_enddate) OVER (PARTITION BY visit_id, admin) AS episode_enddate
FROM (
SELECT *, count(*) FILTER (WHERE gap) OVER (PARTITION BY visit_id ORDER BY admin_startdate) AS admin
FROM (
SELECT *, admin_startdate - lag(admin_enddate) OVER (PARTITION BY visit_id ORDER BY admin_startdate) > 2 AS gap
FROM (
SELECT *, dense_rank() OVER (ORDER BY ptid, visit_start_date, visit_end_date) AS visit_id -- optional, to simplify
FROM tbl
) sub1
) sub2
) sub3
db<>fiddle here
The innermost subquery sub1
is only to compute a unique visit_id
- which should really be in your table instead of repeating (ptid, visit_start_date, visit_end_date )
over and over. Consider normalizing your design at least that much.
The next subquery sub2
checks for a gap that's greater than two days to the previous row in the same partition.
Subquery sub3
then counts those gaps to identify distinct administration periods (admin
)
In the outer SELECT
, min(admin_startdate)
and max(admin_enddate)
per administration period produce the desired episode dates.
See (with assorted links to more):
CodePudding user response:
CREATE TABLE tb1 AS (
SELECT *, admin_startdate - lag(admin_enddate) OVER (PARTITION BY visit_id ORDER BY admin_startdate) > 2 AS gap
FROM (
SELECT *, dense_rank() OVER (ORDER BY ptid, visit_start_date, visit_end_date) AS visit_id -- optional, to simplify
FROM tbl
) sub1
) ;
CREATE TABLE tb2 AS (
SELECT *, count(*) OVER (PARTITION BY visit_id ORDER BY admin_startdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS admin
FROM tb1
WHERE gap is True
)
;
CREATE TABLE tb3 AS (
SELECT tb1.ptid, tb1.visit_start_date, tb1.visit_end_date, tb1.admin_startdate, tb1.admin_enddate, tb1.visit_id, tb1.gap,
CASE WHEN tb2.admin is NULL THEN 0 else tb2.admin END AS admin
FROM tb1
LEFT JOIN tb2
ON tb1.ptid = tb2.ptid AND tb1.visit_start_date = tb2.visit_start_date AND tb1.visit_end_date = tb2.visit_end_date AND tb1.admin_startdate = tb2.admin_startdate AND tb1.admin_enddate = tb2.admin_enddate AND tb1.visit_id = tb2.visit_id
)
;
CREATE TABLE tb4 AS (
SELECT ptid, visit_start_date, visit_end_date, admin_startdate, admin_enddate
, min(admin_startdate) OVER (PARTITION BY visit_id, admin) AS episode_startdate
, max(admin_enddate) OVER (PARTITION BY visit_id, admin) AS episode_enddate
FROM tb3
)
This is an uglier version adapted from Erwin's answer for Redshift, which does not support FILTER operation. Tested correctly at least on db fiddle