Oracle - Calculate max duration with multiple periods-CodePudding

I'm trying to find a solution to the following problem: Patients in the hospital stay in multiple departments (and rooms and beds) during their stay. They can be transferred internally and visit some departments twice or more.

So consider the following dataset:

ID	BEGINNING	ENDING	DEPARTMENT	ROOM	BED
1	13-08-2021 10:42:00	15-08-2021 14:34:00	HAEM	123	456
2	13-08-2021 14:34:00	15-08-2021 09:22:00	HAEM	345	789
3	15-08-2021 09:22:00	28-08-2021 21:09:00	ACH	111	222
4	28-08-2021 21:09:00	01-09-2021 13:34:00	HAEM

I want to find the longest uninterrupted stay during their visit which is described as the main department of the case.

If I just take the MIN(BEGINNING) and MAX(ENDING) and group by the department the longest stay would be in the HAEM department (13-08 - 01-09) but in reality, the stay in the ACH department is the longest uninterrupted stay (15-08 - 28-08).

How can I accomplish that? I think I have to use LEAD to see when the department changes, but how can I set the partition then to find the MAX and MIN dates?

What I want to return is:

DEPARTMENT	BEGINNING	END	DURATION
HAEM	13-08-2021 10:42:00	15-08-2021 09:22:00	2
ACH	15-08-2021 09:22:00	28-08-2021 21:09:00	13
HAEN	28-08-2021 21:09:00	01-09-2021 13:34:00	4

Out of that, I can get the department I'm looking for.

SQL Fiddle

CodePudding user response：

Here is a solution using the match_recognize clause. The output shows all continuous stays in the same department (regardless of room and bed) for each patient. In the sample data I have just one patient, but the query should work without modification for all patients at once. If the output is needed for just one patient, that can be added as a filter (where clause).

I created slightly different data for testing - adding a column for patient id, and changing the ending for the first row so that there are no overlapping stays for the same patient (which make no sense in this use case).

Note that the choice of data type for beginning and ending is suboptimal; it should be changed to date if possible. As it is, I had to use cast(... as date) in the query. Without these casts, the result would be of interval day to second data type - which may be OK, but aggregation (such as taking the max()) is not supported for the interval data type, for reasons known only to Oracle.

Data for testing:

drop table case_departments purge;

create table case_departments(
  stay_id    number generated by default on null as identity,
  patient_id number,
  beginning  timestamp,
  ending     timestamp,
  department varchar2(50),
  room       varchar2(50),
  bed        varchar2(50)
);

insert into case_departments (patient_id , beginning, ending, department, room, bed) 
values (1239,
        to_date('13-08-2021 10:42:00', 'DD-MM-YYYY HH24:MI:SS'), 
        to_date('13-08-2021 14:34:00', 'DD-MM-YYYY HH24:MI:SS'), 
        'HAEM', 
        '123', 
        '456');
        
insert into case_departments (patient_id, beginning, ending, department, room, bed) 
values (1239,
        to_date('13-08-2021 14:34:00', 'DD-MM-YYYY HH24:MI:SS'), 
        to_date('15-08-2021 09:22:00', 'DD-MM-YYYY HH24:MI:SS'), 
        'HAEM', 
        '345', 
        '789');
         
insert into case_departments (patient_id, beginning, ending, department, room, bed) 
values (1239,
        to_date('15-08-2021 09:22:00', 'DD-MM-YYYY HH24:MI:SS'), 
        to_date('28-08-2021 21:09:00', 'DD-MM-YYYY HH24:MI:SS'), 
        'ACH', 
        '111', 
        '222');
        
insert into case_departments (patient_id, beginning, ending, department) 
values (1239,
        to_date('28-08-2021 21:09:00', 'DD-MM-YYYY HH24:MI:SS'), 
        to_date('01-09-2021 13:34:00', 'DD-MM-YYYY HH24:MI:SS'), 
        'HAEM');

commit;

Which looks like this:

select * from case_departments;


STAY_ID PATIENT_ID BEGINNING               ENDING                  DEPARTMENT ROOM BED 
------- ---------- ----------------------- ----------------------- ---------- ---- ----
      1       1329 2021-08-13 10:42:00.000 2021-08-15 14:34:00.000 HAEM       123  456 
      2       1239 2021-08-15 14:34:00.000 2021-08-16 09:22:00.000 HAEM       345  789 
      3       1239 2021-08-16 09:22:00.000 2021-08-28 21:09:00.000 ACH        111  222 
      4       1239 2021-08-28 21:09:00.000 2021-09-01 13:34:00.000 HAEM

Query and output:

select patient_id, department, beginning, ending,
       round(duration, 2) as duration
from   case_departments
match_recognize(
  partition by patient_id
  order     by beginning
  measures  f.department as department,  
            f.beginning  as beginning,
            last(ending) as ending,
            cast(last(ending) as date)
              - cast(first(beginning) as date) as duration
  pattern   ( f m* )
  define    m as department = f.department and beginning = prev(ending)
);

PATIENT_ID DEPARTMENT BEGINNING               ENDING                    DURATION
---------- ---------- ----------------------- ----------------------- ----------
      1239 HAEM       2021-08-13 10:42:00.000 2021-08-15 09:22:00.000       1.94
      1239 ACH        2021-08-15 09:22:00.000 2021-08-28 21:09:00.000      13.49
      1239 HAEM       2021-08-28 21:09:00.000 2021-09-01 13:34:00.000       3.68

CodePudding user response：

You can do:

select
  grp,
  max(department) as department,
  min(beginning) as beginning,
  max(ending) as ending,
  max(ending) - min(beginning) as duration
from (
  select x.*,
    sum(inc) over(order by beginning) as grp
  from (
    select d.*,
      case when beginning = lag(ending) over(order by beginning)
            and department = lag(department) over(order by beginning)
        then 0 else 1 end as inc
    from case_departments d
  ) x
) y
group by grp

Result:

 GRP  DEPARTMENT  BEGINNING           ENDING              DURATION
 ---  ----------  ------------------  ------------------  ------------
 1    HAEM        13-AUG-21 10.42.00  15-AUG-21 14.34.00   02 03:52:00
 2    ACH         15-AUG-21 09.22.00  28-AUG-21 21.09.00   13 11:47:00
 3    HAEM        28-AUG-21 21.09.00  01-SEP-21 13.34.00   03 16:25:00

See running example at db<>fiddle.

Note: I would assume there's a typo in the first row, and that the ending date of the first row is 13-08-2021 14:34:00. Otherwise the query would need to account for overlapping.