Home > Enterprise >  How do I count missing values per variable in PROC SQL?
How do I count missing values per variable in PROC SQL?

Time:10-06

I need to check my variables for missing values, and I figured PROC SQL seemed like a decent way of doing things. The code below does not seem to work however, it just outputs the total number of observations in the dataset:

PROC SQL; SELECT count(UtgUtl_FlygSSEK_Pers is missing) FROM IBIS3_5;
WHERE Exkl_UtgUtl_Flyg=1;
quit;

How can I make this work?

CodePudding user response:

You are counting how many times the result of testing if the variable is missing results in a non-missing value. Since the result of the test is always going to be either 1 (TRUE) or 0 (FALSE) it is never missing. So you are doing the same thing as if you did COUNT(*).

So the SQL method is to just use count(UtgUtl_FlygSSEK_Pers) to count the non-missing values. If you want the number of missing values instead then subtract from the total number of observations.

count(*)-count(UtgUtl_FlygSSEK_Pers)

But why not use the MISSING() function?

PROC SQL; 
   SELECT Exkl_UtgUtl_Flyg
        , count(*) as N_observations
        , sum(missing(UtgUtl_FlygSSEK_Pers)) as N_missing
   FROM IBIS3_5
   group by Exkl_UtgUtl_Flyg
   ;
quit;

CodePudding user response:

The nmiss-function seems to do the job:

PROC SQL; 
SELECT nmiss(UtgUtl_FlygSSEK_Pers)
FROM IBIS3_5
WHERE Exkl_UtgUtl_Flyg=1;
quit;
  • Related