I need to fill the nulls of a column with the mean sum of the division of two columns An example would be
A B C ... B_01 C_01
5 . .
5 2 3
7 3 1,2
9 3 0,3
4 . .
Well, I would like the missing value for column B to be (2/5 3/7 3/9) / 3 * its corresponding column A For column new_c (3/5 1,2/7 0,3/9)/3 * its corresponding column A
I have thought about doing this, but it turns out that I have 60 columns with which to do it and the only way it comes to mi mind is to do this 60 times.
Proc sql;
create table new as
Select *
, sum(B/A)/sum(case when B is missimg then . else 1) end as new_B
, sum(C/A)/sum(case when C is missimg then . else 1) end as new_C_01
from table_one
;
Thanks
CodePudding user response:
PROC SQL should be able to do that easily.
First let's convert your data listing into an actual dataset.
data have;
input A B C ;
cards;
5 . .
5 2 3
7 3 1.2
9 3 0.3
4 . .
;
Now let's use it to create a new version of B that follows your rules.
proc sql;
create table want as
select *,coalesce(b,a*mean(b/a)) as new_b
from have
;
quit;
Results:
OBS A B C new_b
1 5 . . 1.93651
2 5 2 3.0 2.00000
3 7 3 1.2 3.00000
4 9 3 0.3 3.00000
5 4 . . 1.54921
CodePudding user response:
You can use Proc MEANS to compute the mean fraction of each of the 60 columns, and apply the imputation rule in a DATA step.
Example:
data have;
call streaminit(20230129);
do row = 1 to 100;
a = rand('integer', 30);
array x x1-x60;
do over x;
x = ifn(rand('uniform') > 0.30, rand('integer', a-1), .);
end;
output;
end;
run;
data fractions;
set have;
array f x1-x60;
do over f;
if not missing(f) then f = f / a;
end;
rename x1-x60 = f1-f60;
run;
proc means noprint data=fractions;
output out=means mean(f1-f60)=mean1-mean60;
var f1-f60;
run;
data want;
set have;
one = 1;
set means point=one;
array means mean1-mean60;
array x x1-x60;
do over x;
if missing (x) then means = means * a; else means = x;
end;
rename mean1-mean60=new_x1-new_x60;
run;