Home > OS >  Pass a list of variables through is.na when conditionally mutating with case_when
Pass a list of variables through is.na when conditionally mutating with case_when

Time:06-13

I am working with survey data where I need to identify whether responses were not applicable or simply skipped.

  • x1 and x2 are my primary response questions. x3 is only filled out when someone indicates if x1 and x2 are not applicable.
  • I want row 1 and row 2 to equal 0 as x1 and x2 were not applicable, and I want row 4 to equal 1 as x1 and x2 was correctly reported as "None of the above."
  • Row 3 is the truly missing observation as neither x1, x2, or x3 were answered.

My code below works, but is there a simplier way where I can just refer to my list variable rather than writing a lengthy AND statement using x1 and x2? I have simplified this example as my full dataset has 10 x variables.

library(dplyr)
df3 <- tibble(x1 = c("Yes", "No", NA, NA), x2 = c("Yes", "Yes", NA, NA), x3 = c(NA, NA, NA, "None of the above"))
list <- c("x1", "x2")

df3 %>% 
  mutate(
    x3 = coalesce(as.integer(x3 == "None of the above"), 0),
    x3 = case_when(
      x3 == 1 ~ 1,
      is.na(x1) & is.na(x2) ~ NA_real_,
      TRUE ~ 0))

CodePudding user response:

A minimal revision is replacing is.na(x1) & is.na(x2) with if_all(..., is.na), i.e.

df3 %>% 
  mutate(
    x3 = coalesce(as.integer(x3 == "None of the above"), 0),
    x3 = case_when(
      x3 == 1 ~ 1,
      if_all(all_of(list), is.na) ~ NA_real_,
      TRUE ~ 0))

With if_all(), you don't even need to refer to the list variable, just pass a simple tidy-select helper instead, e.g.

  • if_all(x1:x2, is.na) (This way x1 & x2 must be consecutive columns)
  • if_all(starts_with("x"), is.na)
  • if_all(num_range("x", 1:2), is.na)

CodePudding user response:

Here is one idea. We can use all_of, select to select the columns based on the target vector (I changed list to target as list is a function in R), then use is.na and rowSums to evaluate if all columns are NA. We can combine this strategy with case_when to create the x3 column.

library(dplyr)

target <- c("x1", "x2")

df3 %>%
  mutate(
    x3 = case_when(
      rowSums(is.na(select(., all_of(target)))) == 0               ~0L,
      rowSums(is.na(select(., all_of(target)))) != 0 & !is.na(x3)  ~1L,
      TRUE                                                         ~NA_integer_
    ), 
  )
# # A tibble: 4 × 3
#   x1    x2       x3
#   <chr> <chr> <int>
# 1 Yes   Yes       0
# 2 No    Yes       0
# 3 NA    NA       NA
# 4 NA    NA        1
  • Related