Home > database >  Setting missing values using labelled package across multiple columns?
Setting missing values using labelled package across multiple columns?

Time:03-17

I am using the labelled package and trying to set user-defined missing values. I have a dataframe where I want to set missing values for a list of specific columns rather than the entire dataset.

Currently I have to type out each column (s2 and s3). Is there a more efficient way? My full dataset has dozens of columns.

df <- tibble(s1 = c(1, 2, 3, 9), s2 = c(1, 1, 2, 9), s3 = c(1, 1, 2, 9))
df <- df %>% 
  set_na_values(., s2 = 9) %>% 
  set_na_values(., s3 = 9)
na_values(df$s1)
na_values(df$s2)
na_values(df$s3)

CodePudding user response:

The set_na_values() function takes multiple pairs so you don't need to call it more than once:

library(labelled)
library(dplyr)

df %>%
  set_na_values(s2 = 9, s3 = 9)

If you were dealing with a lot of variables you could programatically build a named vector or list (if there are multiple missing values per variable) and splice it inside the function. If, from your comment you wanted to apply it to everything except the s1 variable, you can do:

nm <- setdiff(names(df), "s1")

df %>%
  set_na_values(!!!setNames(rep(9, length(nm)), nm))

# A tibble: 4 x 3
     s1        s2        s3
  <dbl> <dbl lbl> <dbl lbl>
1     1    1         1     
2     2    1         1     
3     3    2         2     
4     9    9 (NA)    9 (NA)

Alternatively, you can use labelled_spss() and take advantage of across() which allows tidyselect semantics (but this will overwrite any existing labelled values):

df %>%
  mutate(across(-s1, labelled_spss, na_values = 9))

# A tibble: 4 x 3
     s1        s2        s3
  <dbl> <dbl lbl> <dbl lbl>
1     1    1         1     
2     2    1         1     
3     3    2         2     
4     9    9 (NA)    9 (NA)

To reset any existing values use:

df %>%
  mutate(across(-s1, ~ labelled_spss(.x, labels = val_labels(.x), na_values = 9)))
  •  Tags:  
  • r
  • Related