I'm trying to find a solution to the following problem: Patients in the hospital stay in multiple departments (and rooms and beds) during their stay. They can be transferred internally and visit some departments twice or more.
So consider the following dataset:
ID | BEGINNING | ENDING | DEPARTMENT | ROOM | BED |
---|---|---|---|---|---|
1 | 13-08-2021 10:42:00 | 15-08-2021 14:34:00 | HAEM | 123 | 456 |
2 | 13-08-2021 14:34:00 | 15-08-2021 09:22:00 | HAEM | 345 | 789 |
3 | 15-08-2021 09:22:00 | 28-08-2021 21:09:00 | ACH | 111 | 222 |
4 | 28-08-2021 21:09:00 | 01-09-2021 13:34:00 | HAEM |
I want to find the longest uninterrupted stay during their visit which is described as the main department of the case.
If I just take the MIN(BEGINNING) and MAX(ENDING) and group by the department the longest stay would be in the HAEM department (13-08 - 01-09) but in reality, the stay in the ACH department is the longest uninterrupted stay (15-08 - 28-08).
How can I accomplish that? I think I have to use LEAD to see when the department changes, but how can I set the partition then to find the MAX and MIN dates?
What I want to return is:
DEPARTMENT | BEGINNING | END | DURATION |
---|---|---|---|
HAEM | 13-08-2021 10:42:00 | 15-08-2021 09:22:00 | 2 |
ACH | 15-08-2021 09:22:00 | 28-08-2021 21:09:00 | 13 |
HAEN | 28-08-2021 21:09:00 | 01-09-2021 13:34:00 | 4 |
Out of that, I can get the department I'm looking for.
CodePudding user response:
Here is a solution using the match_recognize
clause. The output shows all continuous stays in the same department (regardless of room and bed) for each patient. In the sample data I have just one patient, but the query should work without modification for all patients at once. If the output is needed for just one patient, that can be added as a filter (where
clause).
I created slightly different data for testing - adding a column for patient id, and changing the ending
for the first row so that there are no overlapping stays for the same patient (which make no sense in this use case).
Note that the choice of data type for beginning
and ending
is suboptimal; it should be changed to date
if possible. As it is, I had to use cast(... as date)
in the query. Without these casts, the result would be of interval day to second
data type - which may be OK, but aggregation (such as taking the max()
) is not supported for the interval data type, for reasons known only to Oracle.
Data for testing:
drop table case_departments purge;
create table case_departments(
stay_id number generated by default on null as identity,
patient_id number,
beginning timestamp,
ending timestamp,
department varchar2(50),
room varchar2(50),
bed varchar2(50)
);
insert into case_departments (patient_id , beginning, ending, department, room, bed)
values (1239,
to_date('13-08-2021 10:42:00', 'DD-MM-YYYY HH24:MI:SS'),
to_date('13-08-2021 14:34:00', 'DD-MM-YYYY HH24:MI:SS'),
'HAEM',
'123',
'456');
insert into case_departments (patient_id, beginning, ending, department, room, bed)
values (1239,
to_date('13-08-2021 14:34:00', 'DD-MM-YYYY HH24:MI:SS'),
to_date('15-08-2021 09:22:00', 'DD-MM-YYYY HH24:MI:SS'),
'HAEM',
'345',
'789');
insert into case_departments (patient_id, beginning, ending, department, room, bed)
values (1239,
to_date('15-08-2021 09:22:00', 'DD-MM-YYYY HH24:MI:SS'),
to_date('28-08-2021 21:09:00', 'DD-MM-YYYY HH24:MI:SS'),
'ACH',
'111',
'222');
insert into case_departments (patient_id, beginning, ending, department)
values (1239,
to_date('28-08-2021 21:09:00', 'DD-MM-YYYY HH24:MI:SS'),
to_date('01-09-2021 13:34:00', 'DD-MM-YYYY HH24:MI:SS'),
'HAEM');
commit;
Which looks like this:
select * from case_departments;
STAY_ID PATIENT_ID BEGINNING ENDING DEPARTMENT ROOM BED
------- ---------- ----------------------- ----------------------- ---------- ---- ----
1 1329 2021-08-13 10:42:00.000 2021-08-15 14:34:00.000 HAEM 123 456
2 1239 2021-08-15 14:34:00.000 2021-08-16 09:22:00.000 HAEM 345 789
3 1239 2021-08-16 09:22:00.000 2021-08-28 21:09:00.000 ACH 111 222
4 1239 2021-08-28 21:09:00.000 2021-09-01 13:34:00.000 HAEM
Query and output:
select patient_id, department, beginning, ending,
round(duration, 2) as duration
from case_departments
match_recognize(
partition by patient_id
order by beginning
measures f.department as department,
f.beginning as beginning,
last(ending) as ending,
cast(last(ending) as date)
- cast(first(beginning) as date) as duration
pattern ( f m* )
define m as department = f.department and beginning = prev(ending)
);
PATIENT_ID DEPARTMENT BEGINNING ENDING DURATION
---------- ---------- ----------------------- ----------------------- ----------
1239 HAEM 2021-08-13 10:42:00.000 2021-08-15 09:22:00.000 1.94
1239 ACH 2021-08-15 09:22:00.000 2021-08-28 21:09:00.000 13.49
1239 HAEM 2021-08-28 21:09:00.000 2021-09-01 13:34:00.000 3.68
CodePudding user response:
You can do:
select
grp,
max(department) as department,
min(beginning) as beginning,
max(ending) as ending,
max(ending) - min(beginning) as duration
from (
select x.*,
sum(inc) over(order by beginning) as grp
from (
select d.*,
case when beginning = lag(ending) over(order by beginning)
and department = lag(department) over(order by beginning)
then 0 else 1 end as inc
from case_departments d
) x
) y
group by grp
Result:
GRP DEPARTMENT BEGINNING ENDING DURATION
--- ---------- ------------------ ------------------ ------------
1 HAEM 13-AUG-21 10.42.00 15-AUG-21 14.34.00 02 03:52:00
2 ACH 15-AUG-21 09.22.00 28-AUG-21 21.09.00 13 11:47:00
3 HAEM 28-AUG-21 21.09.00 01-SEP-21 13.34.00 03 16:25:00
See running example at db<>fiddle.
Note: I would assume there's a typo in the first row, and that the ending date of the first row is 13-08-2021 14:34:00
. Otherwise the query would need to account for overlapping.