Home > Enterprise >  fill the nulls of a column with the mean sum of the division of two columns SAS
fill the nulls of a column with the mean sum of the division of two columns SAS

Time:01-29

I need to fill the nulls of a column with the mean sum of the division of two columns An example would be

A B C ... B_01 C_01
5 . .
5 2 3
7 3 1,2
9 3 0,3
4 . .

Well, I would like the missing value for column B to be (2/5 3/7 3/9) / 3 * its corresponding column A For column new_c (3/5 1,2/7 0,3/9)/3 * its corresponding column A

I have thought about doing this, but it turns out that I have 60 columns with which to do it and the only way it comes to mi mind is to do this 60 times.

Proc sql;
  create table new as
  Select *
  , sum(B/A)/sum(case when B is missimg then . else 1) end as new_B
  , sum(C/A)/sum(case when C is missimg then . else 1) end as new_C_01
  from table_one
  ;

Thanks

CodePudding user response:

PROC SQL should be able to do that easily.

First let's convert your data listing into an actual dataset.

data have;
  input A B C ;
cards;
5 . .
5 2 3
7 3 1.2
9 3 0.3
4 . .
;

Now let's use it to create a new version of B that follows your rules.

proc sql;
create table want as 
  select *,coalesce(b,a*mean(b/a)) as new_b
  from have
;
quit;

Results:

OBS    A    B     C      new_b

 1     5    .     .     1.93651
 2     5    2    3.0    2.00000
 3     7    3    1.2    3.00000
 4     9    3    0.3    3.00000
 5     4    .     .     1.54921

CodePudding user response:

You can use Proc MEANS to compute the mean fraction of each of the 60 columns, and apply the imputation rule in a DATA step.

Example:

data have;
  call streaminit(20230129);
  do row = 1 to 100;
    a = rand('integer', 30);
    array x x1-x60;
    do over x;
      x = ifn(rand('uniform') > 0.30, rand('integer', a-1), .);
    end;
    output;
  end;
run;

data fractions;
  set have;
  array f x1-x60;
  do over f;
    if not missing(f) then f = f / a;
  end;
  rename x1-x60 = f1-f60;
run;

proc means noprint data=fractions;
  output out=means mean(f1-f60)=mean1-mean60;
  var f1-f60;
run;

data want;
  set have;

  one = 1;
  set means point=one;

  array means mean1-mean60;
  array x x1-x60;

  do over x;
    if missing (x) then means = means * a; else means = x;
  end;

  rename mean1-mean60=new_x1-new_x60;
run;
  • Related