I have the following table and data:
PatientID PatientName Diagnosed ReportDate ...
1 0
1 0
1 0
1 1
So there are multiple rows for each patient, as the reports come few times a day. Whenever the diagnosed field is changed to 1, for that patient, I'd like to get the past 3 days of data . So when Diagnosed ==1, get report time -3 days of data for each patient.
SELECT Patients.ReportDate
FROM Patients
WHERE Diagnosed = 1 and date > ReportDate - interval '3' day;
So getting the past 3 days of data, can be done with ReportDate - interval time, but how do I specify that for every patient (since multiple ids can be for that patient) based on the diagnosed field?
I usually do this filtering after getting csvs in python, but the data set is too large, so I'd like to filter before I convert them to dataframes.
CodePudding user response:
This should work:
SELECT p.*
FROM (
SELECT PatientID, ReportDate
FROM (
SELECT PatientID, ReportDate, Diagnosed
, lag(Diagnosed) OVER (PARTITION BY PatientID ORDER BY ReportDate) AS last_diagnosed
FROM Patients
) p0
WHERE Diagnosed = 1
AND last_diagnosed = 0
) d
JOIN Patients p USING (PatientID)
WHERE p.ReportDate BETWEEN d.ReportDate - interval '3 days' AND d.ReportDate
ORDER BY p.PatientID, p.ReportDate;
Subquery d
select rows where Diagnosed
just switched to 1
. Then self-join to select your time frame.
If multiple episodes for the same client can overlap, results get messy. You'll have to define how to deal with that.
For gaps-and-island basics, see:
CodePudding user response:
You can look at this another way, which is whether diagnosed = 1
in the next three days -- and take all rows where that is true:
select p.*
from (select p.*,
count(*) filter (where diagnosed = 1) over (partition by patientId order by reportDate range between interval '0 day' following and interval '3 day' following) as cnt_diagnosed_3
from patients p
) p
where cnt_diagnosed_3 > 0
order by patientId, reportDate;