I asked a very similar question here and now trying to handle an edge case that came up. In the original problem, the dates worked out nicely in that associated cases had matching open_date
/close_date
pairs. But in the example below, case4 and case3 are supposed to be linked but case3 closed after case4 opened. We are able to tell that these two cases should be linked because case3 cannot be a final case in a sequence because it has the status deferred
and its close_date
is in close proximity to when the next deferred case was opened. I'm wondering if there is a good way to consider this condition when matching the cases by open_date
/close_date
?
case_id open_date close_date user_id type status
case5 2021-06-01 2021-08-25 user1 request complete
case4 2021-05-05 2021-06-01 user1 request deferred
case3 2021-03-01 2021-05-12 user1 request deferred
case2 2020-09-15 2021-03-01 user1 request deferred
case1 2020-09-01 2020-09-15 user1 request deferred
CodePudding user response:
The solution is just a tiny adjustment (based on the original solution provided by @MikhailBerlyant) to allow / detect overlap of successive cases.
The fiddle (with original and new problem)
Notice the inequality: open_date > lead(close_date)
with your_table (case_id, open_date, close_date, user_id, type) as (
select 'case5', '2021-06-01', '2021-08-25', 'user1', 'request' union all
select 'case4', '2021-05-05', '2021-06-01', 'user1', 'request' union all
select 'case3', '2021-03-01', '2021-05-12', 'user1', 'request' union all
select 'case2', '2020-09-15', '2021-03-01', 'user1', 'request' union all
select 'case1', '2020-09-01', '2020-09-15', 'user1', 'request'
)
select *,
case row_number() over(partition by user_id, type, map_id order by open_date)
when 1 then 'new case'
when count(1) over(partition by user_id, type, map_id) then 'last deferred case'
else 'deferred case'
end as status
from (
select *,
SUM(new_case) over(partition by user_id, type order by open_date) as map_id
from (
select *,
COALESCE(open_date > lead(close_date) over(partition by user_id, type order by open_date desc), 1) new_case
from your_table
) AS t1
) AS t2
ORDER BY open_date DESC
;
The result:
case_id | open_date | close_date | user_id | type | new_case | map_id | status |
---|---|---|---|---|---|---|---|
case5 | 2021-06-01 | 2021-08-25 | user1 | request | 0 | 1 | last deferred case |
case4 | 2021-05-05 | 2021-06-01 | user1 | request | 0 | 1 | deferred case |
case3 | 2021-03-01 | 2021-05-12 | user1 | request | 0 | 1 | deferred case |
case2 | 2020-09-15 | 2021-03-01 | user1 | request | 0 | 1 | deferred case |
case1 | 2020-09-01 | 2020-09-15 | user1 | request | 1 | 1 | new case |
CodePudding user response:
You will have to make a small adjustment to your BigQuery
code on the inner FROM
statement:
with your_table as (
select 'case5' as case_id, '2021-06-01' as open_date, '2021-08-25' as close_date, 'user1' as user_id, 'request' as type union all
select 'case4', '2021-05-05', '2021-06-01', 'user1', 'request' union all
select 'case3', '2021-03-01', '2021-05-12', 'user1', 'request' union all
select 'case2', '2020-09-15', '2021-03-01', 'user1', 'request' union all
select 'case1', '2020-09-01', '2020-09-15', 'user1', 'request'
)
select *,
case row_number() over(partition by user_id, type, map_id order by open_date)
when 1 then 'new case'
when count(1) over(partition by user_id, type, map_id) then 'last deferred case'
else 'deferred case'
end as status
from (
select * except(new_case),
countif(new_case) over(partition by user_id, type order by open_date) as map_id
from (
select *,
case when open_date <= lead(close_date) over(partition by user_id, type order by open_date desc) then false
when open_date != lead(close_date) over(partition by user_id, type order by open_date desc) then true
else False end new_case
from your_table
)
)
I have tested with old scenario and new scenario and a combined one. This should address your issue. I consult the following documentation for it: