How to check max difference in dates between events and create new column in SAS Enterprise Guide?-CodePudding

I have table in SAS Enterprise Guide like below.

Data types and meaning:

ID - numeric - ID of client
DT - date - date of change
OFFER_1 - charcter - current offer
OFFER_2 - character - offer after change

Values in original dataset are not sorted, but it can be sorted if it is important for solution.

ID   | DT        | OFFER_1  | OFFER_2
-----|-----------|----------|----------
123  | 01MAY2020 | PR       | PR
123  | 05MAY2020 | PR       | P
123  | 10MAY2020 | P        | P
123  | 11MAY2020 | P        | P
123  | 20MAY2020 | P        | PR
123  | 21MAY2020 | PR       | M
123  | 25MAY2020 | M        | M
777  | 30MAY2020 | PR       | M
223  | 02JAN2020 | PR       | PR
223  | 15MAR2020 | PR       | PR
402  | 20MAR2020 | M        | M
33   | 11AUG2020 | M        | PR
11   | 20JAN2020 | PR       | M
11   | 05FEB2020 | M        | M

And I need to create new column "COL1" where will be information:

If client change offer from PR to P or M then count how many maximum days he lasted before returning to PR again, or how many days he is already on P or M after the change, if he has not returned to PR
If he did not change PR to P or M "COL1" = 0

So, as a result I need something like below:

ID   | DT        | OFFER_1  | OFFER_2  | COL1
-----|-----------|----------|----------|---------
123  | 01MAY2020 | PR       | PR       | 15
123  | 05MAY2020 | PR       | P        | 15
123  | 10MAY2020 | P        | P        | 15
123  | 11MAY2020 | P        | P        | 15
123  | 20MAY2020 | P        | PR       | 15
123  | 21MAY2020 | PR       | M        | 15
123  | 25MAY2020 | M        | M        | 15
777  | 30MAY2020 | PR       | M        | 1
223  | 02JAN2020 | PR       | PR       | 0
223  | 15MAR2020 | PR       | PR       | 0
402  | 20MAR2020 | M        | M        | 0
33   | 11AUG2020 | M        | PR       | 0
11   | 20JAN2020 | PR       | M        | 16
11   | 05FEB2020 | M        | M        | 16

Because:

ID = 123, has 15 in "COL1" - because changed PR to P or M and lasted a maximum of 15 days (from 05MAY2020 till 20MAY2020) before return to PR again, changed PR to P or M 2 times but the 2nd time the change lasted 4 days and 15 > 4
ID = 777, has 1 in "COL1" - because changed PR to P or M and lasted a maximum of 1 day (30MAY2020)
ID = 223, has 0 in "COL1" - because he did not change PR to P or M
ID = 402, has 0 in "COL1" - because he did not change PR to P or M
ID = 33, has 0 in "COL1" - because he did not change PR to P or M
ID = 11, has 16 in "COL1" - because changed PR to P or M and lasted a maximum of 16 days (from 20JAN2020 till 05FEB2020) and did not return to PR again

@Stu Sztukowski - Below exaplmes from my real data where your code does not work. I add also how it should be :)

Examples where code works wrong:

ID	DT	OFFER_1	OFFER_2	COL1
1020	01SEP2020	P	P	1 -> number of days between 01SEP2020 and 02SEP2020 but there was not changes from PR to P or M, so this client does not interes for us, it should be 0
1020	02SEP2020	P	P	1
2030	29JUL2022	P	P	50 -> number of days between 20JUL2022 and 17SEP2022 but there was not changes from PR to P or M, so this client does not interes for us, it should be 0
2030	15JUL2022	P	P	50
2030	17SEP2022	P	P	50

How it should be: Because only change from PR to P or M are interested for us, example at the bottom of this post presents it. Below examples are similar like my examples ID = 223 or 402 from the bottom of this post :)

ID	DT	OFFER_1	OFFER_2
1020	01SEP2020	P	P
1020	02SEP2020	P	P
2030	29JUL2022	P	P
2030	15JUL2022	P	P
2030	17SEP2022	P	P

How can I do that in SAS Enterprise Guide in PROC SQL or in normal SAS code ?

CodePudding user response：

If offer_1 is P or M, then we just need to take the difference between the current date and the previous date. We'll use some rules to calculate the differences based on your rules. From there we can join the max values back with the original data.

proc sort data=have;
    by id dt;
run;

data date_difs;
    set have;
    by id dt;
    retain flag_pr_pm start_dt;

    if(first.id) then call missing(flag_pr_pm, start_dt);

    if(offer_1 = 'PR' AND offer_2 IN('P', 'M') ) then do;
        flag_pr_pm = 1;
        start_dt   = dt;
    end;

    if(   (offer_1 IN('P', 'M') AND offer_2 = 'PR')
       OR (flag_pr_pm AND last.id AND NOT first.id)
      ) 
    then do;
        flag_pr_pm  = 0;
        total_days = dt - start_dt;
    end;

    if(    first.id 
       AND last.id 
       AND offer_1 = 'PR' 
       AND offer_2 IN ('P', 'M')
      ) 
    then total_days = 1;

    format start_dt date9.;
run;

Which gets you:

id    dt         offer_1    offer_2   flag_pr_pm    start_dt    total_days
11    20JAN2020  PR         M         1             20JAN2020   .
11    05FEB2020  M          M         0             20JAN2020   16
33    11AUG2020  M          PR        0             .           .
123   01MAY2020  PR         PR        .             .           .
123   05MAY2020  PR         P         1             05MAY2020   .
123   10MAY2020  P          P         1             05MAY2020   .
123   11MAY2020  P          P         1             05MAY2020   .
123   20MAY2020  P          PR        0             05MAY2020   15
123   21MAY2020  PR         M         1             21MAY2020   .
123   25MAY2020  M          M         0             21MAY2020   4
223   02JAN2020  PR         PR        .             .           .
223   15MAR2020  PR         PR        .             .           .
402   20MAR2020  M          M         .             .           .
777   30MAY2020  PR         M         1             30MAY2020   1
1020  01SEP2020  P          P         .             .           .
1020  02SEP2020  P          P         .             .           .
2030  15JUL2022  P          P         .             .           .
2030  29JUL2022  P          P         .             .           .
2030  17SEP2022  P          P         .             .           .

Now join the max value by ID back to the original data:

proc sql;
    create table want as
        select *, col1
        from have as t1
        LEFT JOIN
             (select id, max(total_days) as col1
              from date_difs
              group by id
             ) as t2
        ON t1.id = t2.id
    ;
quit;

This gives you the exact solution within your sample data.