I need to check my variables for missing values, and I figured PROC SQL seemed like a decent way of doing things. The code below does not seem to work however, it just outputs the total number of observations in the dataset:
PROC SQL; SELECT count(UtgUtl_FlygSSEK_Pers is missing) FROM IBIS3_5;
WHERE Exkl_UtgUtl_Flyg=1;
quit;
How can I make this work?
CodePudding user response:
You are counting how many times the result of testing if the variable is missing results in a non-missing value. Since the result of the test is always going to be either 1 (TRUE) or 0 (FALSE) it is never missing. So you are doing the same thing as if you did COUNT(*).
So the SQL method is to just use count(UtgUtl_FlygSSEK_Pers) to count the non-missing values. If you want the number of missing values instead then subtract from the total number of observations.
count(*)-count(UtgUtl_FlygSSEK_Pers)
But why not use the MISSING() function?
PROC SQL;
SELECT Exkl_UtgUtl_Flyg
, count(*) as N_observations
, sum(missing(UtgUtl_FlygSSEK_Pers)) as N_missing
FROM IBIS3_5
group by Exkl_UtgUtl_Flyg
;
quit;
CodePudding user response:
The nmiss-function seems to do the job:
PROC SQL;
SELECT nmiss(UtgUtl_FlygSSEK_Pers)
FROM IBIS3_5
WHERE Exkl_UtgUtl_Flyg=1;
quit;