I need to perform a number of imputations for missing values for modes of transport. There are 12 different modes of transport. As a preliminary step I calculate the number of people using each mode of transport and the corresponding quota:
My code looks like this:
PROC SQL;
CREATE TABLE hjälptabell_transport_SE as
SELECT R_fard_h,antal,antal/sum(antal) as kvot
FROM
(SELECT R_fard_h,count(R_fard_h) as antal
FROM data_resor14_2
WHERE R_fard_h is not missing and R_res_land_grp="Sverige"
GROUP BY R_fard_h);
quit;
And my output ideally looks something like this:
However, in some case zero observations uses a certain mode of transport, making the output look like this instead:
Note that the number of people using the eleventh mode of transport is omitted. It's actually very important that every table is precisely 12 rows long, and if precisely zero people uses a certain mode of transport I still want that row to be included, but with a count of zero.
How can I make this happen?
CodePudding user response:
Simple solution would be to create a table (or improvise a view) with numbers 1 to 12 and then add left join to table data_resor14_2
CodePudding user response:
First, create a template table with the numbers 1-12:
data template;
do r_fard_h = 1 to 12;
output;
end;
run;
Then perform a right join with it on your data. Use coalesce
to fill in any missing values with 0.
PROC SQL;
CREATE TABLE hjälptabell_transport_SE as
SELECT t2.R_fard_h
, antal
, coalesce(antal/sum(antal), 0) as kvot
FROM (SELECT R_fard_h, count(R_fard_h) as antal
FROM data_resor14_2
WHERE R_fard_h is not missing AND R_res_land_grp="Sverige"
GROUP BY R_fard_h
) as t1
RIGHT JOIN
template as t2
ON t1.r_fard_h = t2.r_fard_h
;
quit;