Home > Software engineering >  How to tag all missing values of an SPSS labelled dataframe by their label?
How to tag all missing values of an SPSS labelled dataframe by their label?

Time:08-17

I have labeled SPSS data like this:

library(labelled)
library(tidyverse)

test <- tibble(
  var_1 = labelled_spss(
    c(1:4, 89,999), 
    c(Terrible = 1, Meh = 2, Better = 3, Awesome = 4, dk = 89, "does not apply" = 999)
  ), 
  var_2 = labelled_spss(
    c(1:4, 890,998), 
    c(Terrible = 1, Meh = 2, Better = 3, Awesome = 4, dk = 890, "does not apply" = 998))
  )

test

# A tibble: 6 × 2
                 var_1                var_2
             <dbl lbl>            <dbl lbl>
1   1 [Terrible]         1 [Terrible]      
2   2 [Meh]              2 [Meh]           
3   3 [Better]           3 [Better]        
4   4 [Awesome]          4 [Awesome]       
5  89 [dk]             890 [dk]            
6 999 [does not apply] 998 [does not apply]

Note the different numeric values for dk and does not apply.

I would like to set dk and does not apply as NAs programmatically (i.e. without individually specifying the name of the variable) and also based on the label rather than the value.

My idea is something like this pseudocode:

my_na_labels <- c("dk", "does not apply")

test %>%
   mutate(across(c(var_1, var_2), ~ set_na_values(. %in% my_na_labels)))

Which unfortunately does not work.

The solution given in the labelled vignette uses the variable names and also tags NAs based on their numeric values. Since programmatic tagging NAs based on their numeric values does not work here (since each label has multiple numeric values attached), I am looking for a solution that does not require hard-coded numeric values but works with the existing labels instead.

The outcome, which I can easily produce if I use hard-coded NA values, should look something like this and should be generalizable for many variables:

test %>% 
   set_na_values(var_1 = c(89, 999), 
                 var_2 = c(890, 998))

# A tibble: 6 × 2
                      var_1                     var_2
                  <dbl lbl>                 <dbl lbl>
1   1 [Terrible]              1 [Terrible]           
2   2 [Meh]                   2 [Meh]                
3   3 [Better]                3 [Better]             
4   4 [Awesome]               4 [Awesome]            
5  89 (NA) [dk]             890 (NA) [dk]            
6 999 (NA) [does not apply] 998 (NA) [does not apply]

CodePudding user response:

How about:

library(labelled)
library(dplyr)

my_na_labels <- c("dk", "does not apply")

fun <- function(x, varlabels) {
  na_values(x) <- val_labels(x)[varlabels]
  return(x)
}

test |>
  mutate(across(c("var_1", "var_2"), ~ fun(., varlabels = my_na_labels)))

Output:

# A tibble: 6 × 2
                      var_1                     var_2
                  <dbl lbl>                 <dbl lbl>
1   1 [Terrible]              1 [Terrible]           
2   2 [Meh]                   2 [Meh]                
3   3 [Better]                3 [Better]             
4   4 [Awesome]               4 [Awesome]            
5  89 (NA) [dk]             890 (NA) [dk]            
6 999 (NA) [does not apply] 998 (NA) [does not apply]
  • Related