Select rows for last n days after event occurs-CodePudding

I have the following table and data:

PatientID   PatientName   Diagnosed  ReportDate   ...
1                         0
1                         0
1                         0
1                         1

So there are multiple rows for each patient, as the reports come few times a day. Whenever the diagnosed field is changed to 1, for that patient, I'd like to get the past 3 days of data . So when Diagnosed ==1, get report time -3 days of data for each patient.

SELECT Patients.ReportDate
FROM Patients 
WHERE Diagnosed = 1 and date > ReportDate - interval '3' day;

So getting the past 3 days of data, can be done with ReportDate - interval time, but how do I specify that for every patient (since multiple ids can be for that patient) based on the diagnosed field?

I usually do this filtering after getting csvs in python, but the data set is too large, so I'd like to filter before I convert them to dataframes.

CodePudding user response：

This should work:

SELECT p.*
FROM  (
   SELECT PatientID, ReportDate
   FROM  (
      SELECT PatientID, ReportDate, Diagnosed
           , lag(Diagnosed) OVER (PARTITION BY PatientID ORDER BY ReportDate) AS last_diagnosed
      FROM   Patients
      ) p0
   WHERE  Diagnosed = 1
   AND    last_diagnosed = 0
   ) d
JOIN   Patients p USING (PatientID)
WHERE  p.ReportDate BETWEEN d.ReportDate - interval '3 days' AND d.ReportDate
ORDER  BY p.PatientID, p.ReportDate;

Subquery d select rows where Diagnosed just switched to 1. Then self-join to select your time frame.

If multiple episodes for the same client can overlap, results get messy. You'll have to define how to deal with that.

For gaps-and-island basics, see:

Select longest continuous sequence

CodePudding user response：

You can look at this another way, which is whether diagnosed = 1 in the next three days -- and take all rows where that is true:

select p.*
from (select p.*,
             count(*) filter (where diagnosed = 1) over (partition by patientId order by reportDate range between interval '0 day' following and interval '3 day' following) as cnt_diagnosed_3
      from patients p
     ) p
where cnt_diagnosed_3 > 0
order by patientId, reportDate;