Home > Blockchain >  How to create flag inform whether was some change in two columns on ID in PROC SQL in SAS Enterprise
How to create flag inform whether was some change in two columns on ID in PROC SQL in SAS Enterprise

Time:09-08

I have table in SAS Enterprise Guide like below:

ID  | val1| val2
----|-----|-----
123 | M   | M
123 | M   | P
123 | P   | P
444 | PR  | PR
444 | PR  | PR
567 | PR  | M
567 | M   | M
99  | PR  | P

And I need to creeate new column "col1" with values 0/1:

  • If some ID never changed the value PR from column "val1" to the value of M or P in column "val2" then this ID has 1 else 0

So, as a result I need something like below:

ID  | val1| val2| col1
----|-----|-----|----
123 | M   | M   | 1
123 | M   | P   | 1
123 | P   | P   | 1
444 | PR  | PR  | 1
444 | PR  | PR  | 1
567 | PR  | M   | 0
567 | M   | M   | 0
99  | PR  | P   | 0

Because:

  • 123 - has 1 in "col1" because has never changed PR to M or P
  • 444 - has 1 in "col1" because has never changed PR to M or P
  • 567 - has 0 because changed PR to M
  • 99 - has 0 because changed PR to P

How can I do that in PROC SQL in SAS Enterprise Guide ?

CodePudding user response:

So you want a single value at the ID level that is replicated onto all observations for that level of ID? PROC SQL makes that easy because it will automatically remerge aggregate values back to the detailed observations.

It sounds like the test you want is

val1='PR' and val2 in ('M' 'P')

You then want the overall result to be 1 (TRUE) when that expression is never true.

data have ;
  input ID val1 $ val2 $ EXPECT ;
cards;
123  M    M    1
123  M    P    1
123  P    P    1
444  PR   PR   1
444  PR   PR   1
567  PR   M    0
567  M    M    0
99   PR   P    0
;

proc sql;
  create table want as 
    select *
         , min(not (val1='PR' and val2 in ('M' 'P'))) as COL1
    from have
    group by id
  ;
quit;

Result:

Obs     ID    val1    val2    EXPECT    COL1

 1      99     PR      P         0        0
 2     123     P       P         1        1
 3     123     M       M         1        1
 4     123     M       P         1        1
 5     444     PR      PR        1        1
 6     444     PR      PR        1        1
 7     567     M       M         0        0
 8     567     PR      M         0        0

CodePudding user response:

Solution if the can be changed:

proc sort data=have;
    by ID;
run;
data want;
    * for every ID, read the data you have twice *;
    set have (in=first_pass) have (in=last_pass);
    by ID;
    
    * From the first pass, remember if any observation "changed" from PR to P or M *;
    retain col1;
    if first.ID then col1 = 0;
    if val1 eq 'PR' and val2 in ('P', 'M') then col1 = 1;
    
    * only output the results from the second pass *;
    if last_pass;
run;
  • Related