Home > front end >  SQL date subtraction per subject with index date condition
SQL date subtraction per subject with index date condition

Time:12-07

I have a table of subject records where each subject as 1 or more rows that have a begin/end date along with an index date:

SUBJECT BEGIN_DATE END_DATE INQUIRY_DATE
1 1988-01-01 2010-04-05 2022-05-06
1 2010-04-06 2022-10-02 2022-05-06
2 1996-09-24 2005-08-08 2022-10-01
2 2016-11-21 2022-04-04 2022-10-01
3 2005-01-01 2021-02-12 2022-03-21
4 1999-12-31 2015-07-16 2022-08-15
4 2015-07-20 2020-04-01 2022-08-15
4 2020-12-31 2022-10-01 2022-08-15

What I want to find out is, for each subject, are the 3 months prior to the inquiry date covered continuously by the begin/end ranges (with no more than a 14-day gap between the end date and the next start date)? For example subject 1: they have continuous observation between 1988-01-01 and 2022-10-02, so yes this subject's inquiry date has 3 months covered prior; whereas, subject 2 does not.

Working in SQL (snowflake) and started with something like this (obviously not complete, just not sure how to get to where I want):

with lookback as (
select *, INQUIRY_DATE - interval '3 months' as look_back_3m from tbl)

select *, case when diff >= 14 then 1 else 0 end as flag from (
select SUBJECT, BEGIN_DATE - lag(END_DATE) over(partition by subject order by BEGIN_DATE) as diff from tbl) z

CodePudding user response:

The first step is that you want to filter out any rows, that end prior to 4 months prior to your INQUIRY_DATE or start 1 month after your INQUIRY_DATE, as they are outside the 3 month 14 day rules you have defined.

So some enhanced data:

with your_table(subject, begin_date, end_date, inquiry_date) as (
    select * from values
    (1, '1988-01-01'::date, '2010-04-05'::date, '2022-05-06'::date),
    (1, '2010-04-06'::date, '2022-10-02'::date, '2022-05-06'::date),
    (2, '1996-09-24'::date, '2005-08-08'::date, '2022-10-01'::date),
    (2, '2016-11-21'::date, '2022-04-04'::date, '2022-10-01'::date),
    (3, '2005-01-01'::date, '2021-02-12'::date, '2022-03-21'::date),
    (4, '1999-12-31'::date, '2015-07-16'::date, '2022-08-15'::date),
    (4, '2015-07-20'::date, '2020-04-01'::date, '2022-08-15'::date),
    (4, '2020-12-31'::date, '2022-10-01'::date, '2022-08-15'::date),
    (5, '2022-05-01'::date, '2022-06-01'::date, '2022-08-15'::date),
    (5, '2022-06-01'::date, '2022-07-01'::date, '2022-08-15'::date),
    (5, '2022-07-01'::date, '2022-08-01'::date, '2022-08-15'::date),
    (5, '2022-08-01'::date, '2022-10-01'::date, '2022-08-15'::date),
    (6, '2022-05-01'::date, '2022-05-10'::date, '2022-08-15'::date),
    (6, '2022-06-01'::date, '2022-06-10'::date, '2022-08-15'::date),
    (6, '2022-07-01'::date, '2022-07-10'::date, '2022-08-15'::date),
    (6, '2022-08-01'::date, '2022-10-01'::date, '2022-08-15'::date)
)
select * 
from your_table
where begin_date <= inquiry_date
   and end_date >= dateadd(month, -4, inquiry_date);

gives:

SUBJECT BEGIN_DATE END_DATE INQUIRY_DATE
1 2010-04-06 2022-10-02 2022-05-06
4 2020-12-31 2022-10-01 2022-08-15
5 2022-05-01 2022-06-01 2022-08-15
5 2022-06-01 2022-07-01 2022-08-15
5 2022-07-01 2022-08-01 2022-08-15
5 2022-08-01 2022-10-01 2022-08-15
6 2022-05-01 2022-05-10 2022-08-15
6 2022-06-01 2022-06-10 2022-08-15
6 2022-07-01 2022-07-10 2022-08-15
6 2022-08-01 2022-10-01 2022-08-15

Then you want to find chains of gaps and make them contentious.

something like:

select *
    ,lag(end_date) over(partition by subject order by begin_date) as prior_end
    ,datediff(days, prior_end, begin_date) as days_of_gap
    ,(begin_date - 14) <= prior_end as prior_overlapped
    ,iff(prior_overlapped, null, begin_date) as group_start_date
from your_table
where begin_date <= inquiry_date
   and end_date >= dateadd(month, -4, inquiry_date);

Then you want to ask, is any chain start prior/on INQUIRY_DATE - 3 months and ends on/after INQUIRY_DATE.

select
    subject
    ,begin_date
    ,end_date
    ,group_date
    ,dateadd(month, -3, inquiry_date) as req_start_date
    ,datediff(days, group_date, end_date) as days_of_streak
    ,end_date >= inquiry_date as rule_1
    ,group_date <= dateadd(month, -3, inquiry_date) as rule_2
from (
    select *
        ,lag(group_start_date) ignore nulls over (partition by subject order by begin_date) as l_g_d
        ,nvl(group_start_date, l_g_d) as group_date
    from (
        select *
            ,lag(end_date) over(partition by subject order by begin_date) as prior_end
            ,iff((begin_date - 14) <= prior_end, null, begin_date) as group_start_date
        from your_table
        where begin_date <= inquiry_date
           and end_date >= dateadd(month, -4, inquiry_date)
    )
)
order by subject, begin_date
;

gives:

SUBJECT BEGIN_DATE END_DATE GROUP_DATE REQ_START_DATE DAYS_OF_STREAK RULE_1 RULE_2
1 2010-04-06 2022-10-02 2010-04-06 2022-02-06 4,562 TRUE TRUE
4 2020-12-31 2022-10-01 2020-12-31 2022-05-15 639 TRUE TRUE
5 2022-05-01 2022-06-01 2022-05-01 2022-05-15 31 FALSE TRUE
5 2022-06-01 2022-07-01 2022-05-01 2022-05-15 61 FALSE TRUE
5 2022-07-01 2022-08-01 2022-05-01 2022-05-15 92 FALSE TRUE
5 2022-08-01 2022-10-01 2022-05-01 2022-05-15 153 TRUE TRUE
6 2022-05-01 2022-05-10 2022-05-01 2022-05-15 9 FALSE TRUE
6 2022-06-01 2022-06-10 2022-06-01 2022-05-15 9 FALSE FALSE
6 2022-07-01 2022-07-10 2022-07-01 2022-05-15 9 FALSE FALSE
6 2022-08-01 2022-10-01 2022-08-01 2022-05-15 61 TRUE FALSE

and we want rows that both rules are true..

thus:

select
    subject
    ,group_date
    ,inquiry_date
from (
    select *
        ,lag(group_start_date) ignore nulls over (partition by subject order by begin_date) as l_g_d
        ,nvl(group_start_date, l_g_d) as group_date
    from (
        select *
            ,lag(end_date) over(partition by subject order by begin_date) as prior_end
            ,iff((begin_date - 14) <= prior_end, null, begin_date) as group_start_date
        from your_table
        where begin_date <= inquiry_date
           and end_date >= dateadd(month, -4, inquiry_date)
    )
)
where end_date >= inquiry_date
    and group_date <= dateadd(month, -3, inquiry_date)
order by subject, begin_date
;

gives:

SUBJECT GROUP_DATE INQUIRY_DATE
1 2010-04-06 2022-05-06
4 2020-12-31 2022-08-15
5 2022-05-01 2022-08-15
  • Related