Home > Mobile >  proc sql left join based on condition with missing values
proc sql left join based on condition with missing values

Time:06-17

I am trying to combine two tables using left join for Reconciliation and there are two variables which I want to use for match condition. However, there is an additional requirement for one of the matched variable. If there is a match for second variable then pull that row from second table, if not then put the corresponding second value as missing and still extract other variables from the second table. But I am getting missing values if no match exist for second var. I know this is natural sql behavior. But I just want to try if anything can be done.

Here is a sample code:

    data a;
     subj=1; dat="01jan2022"d; output;
     subj=1; dat="01feb2022"d; output;
     subj=1; dat="05mar2022"d; output;
     subj=2; dat="10may2022"d; output;
     subj=2; dat="11jun2022"d; output;
    run;
    
    data b;
     subj=1; dat_new="01jan2022"d; other_var=1; output;
     subj=1; dat_new="01feb2022"d; other_var=2; output;
     subj=1; dat_new="05mar2022"d; other_var=3; output;
     subj=2; dat_new=.; other_var=11; output;
     subj=2; dat_new="11jun2022"d; other_var=21; output;
    run;
    
    proc sql;
    create table ab_reconciliation
    as
    select a.subj, a.dat format=date9., b.dat_new format=date9., b.other_var
    from a as t1
    left join b as t2
    on t1.subj=t2.subj
    and dat=dat_new; 
    quit;

Output I am getting

Output I have

Output I want

enter image description here

Appreciate your help!

CodePudding user response:

One thing you might consider is just INTERLEAVING the observations by ID and DATE and just using the latest value of OTHER_VAR on the dates from A.

So if you have this input data:

data a;
  input subj dat :date.;
  format dat date9.;
cards;
 1 01JAN2022
 1 01FEB2022
 1 05MAR2022
 2 10MAY2022
 2 11JUN2022
;

data b;
  input subj dat_new :date. other_var;
  format dat_new date9.;
cards;
 1 01JAN2022 1
 1 01FEB2022 2
 1 05MAR2022 3
 2         . 11
 2 11JUN2022 21
;

You can combine them like this:

data want;
  set b(in=inb rename=(dat_new=dat)) a(in=ina) ;
  by subj dat;
  if first.id then other=.;
  if inb then other=other_var;
  retain other;
  if ina ;
  drop other_var ;
  rename other=other_var;
run;
  • Related