Home > Mobile >  case_when doesn't work with multiple conditions over multiple variables
case_when doesn't work with multiple conditions over multiple variables

Time:12-21

I just discovered that, case_when might not work if a variable is recoded based on multiple variables.

Reproducible data:

data <- data.frame(f103 = c(2, NA, NA, 1, 2, 2),
                       f76 = c(2, NA, NA, NA, 3, 3),
                       f4 = c(1,3,3,1,1,2))

The following code produces the same results for var1 and var 2 (which is not what I want):

reprdata <- reprdata %>%
  mutate(var1 = f4) %>% 
  mutate(var1 = case_when(f103 == 2 ~ 3, TRUE ~ as.numeric(var1))) %>%
  mutate(var2 = f4) %>% 
  mutate(var2 = case_when(f103 == 2 ~ 3, f76 == 1 ~ 1, f76 == 2 ~ 2, f76 == 3 ~ 3, TRUE ~ as.numeric(var2)))

The following produces the correct result (i.e., the solution to my problem):

reprdata <- reprdata %>%
  mutate(var1 = f4) %>% 
  mutate(var1 = case_when(f103 == 2 ~ 3, TRUE ~ as.numeric(var1))) %>%
  mutate(var2 = f4) %>% 
  mutate(var2 = case_when(f103 == 2 ~ 3, TRUE ~ as.numeric(var2))) %>%
  mutate(var2 = case_when(f76 == 1 ~ 1, f76 == 2 ~ 2, f76 == 3 ~ 3, TRUE ~ as.numeric(var2)))

(I am aware that in this snippet of my data, the f103 condition for var1 is superfluous, still, I wouldn't expect it to cause this issue.)

I'd be interested to know if someone can explain to my why this problem occurs and how to prevent it in future.

CodePudding user response:

It has to do with how case_when evaluates: It's evaluating from the bottom and up, which is contrary to what most people think intuitively (my experience). I.e.

f76 wins (what you expect!)

library(dplyr)

data |>
    mutate(var1 = case_when(f103 == 2 ~ 3,
                            TRUE ~ f4)) |>
    mutate(var2 = case_when(f76 %in% 1:3 ~ f76,
                            f103 == 2 ~ 3, # NB!
                            TRUE ~ f4))
  f103 f76 f4 var1 var2
1    2   2  1    3    2
2   NA  NA  3    3    3
3   NA  NA  3    3    3
4    1  NA  1    1    1
5    2   3  1    3    3
6    2   3  2    3    3

f103 wins (what you don't expect)

library(dplyr)

data |>
    mutate(var1 = case_when(f103 == 2 ~ 3,
                            TRUE ~ f4)) |>
    mutate(var2 = case_when(f103 == 2 ~ 3, # NB!
                            f76 %in% 1:3 ~ f76 
                            TRUE ~ f4))
  f103 f76 f4 var1 var2
1    2   2  1    3    3
2   NA  NA  3    3    3
3   NA  NA  3    3    3
4    1  NA  1    1    1
5    2   3  1    3    3
6    2   3  2    3    3
  •  Tags:  
  • r
  • Related