Home > OS >  Complex case_when() statement based on values of multiple columns (dplyr)
Complex case_when() statement based on values of multiple columns (dplyr)

Time:10-12

I have a data.frame with risk of bias categories in separate columns in the form

a<- data.frame(
  Q1_long_name=(sample(c("y","n","m"), 21, replace = T)),
  Q2_long_name=(sample(c("y","n","m"), 21, replace = T)),
  Q3_long_name=(sample(c("y","n","m"), 21, replace = T)),
  Q4_long_name=(sample(c("y","n","m"), 21, replace = T)),
  Q5_long_name=(sample(c("y","n","m"), 21, replace = T)),
  Q6_long_name=(sample(c("y","n","m"), 21, replace = T))
  Q7_long_name=(sample(c("y","n","m"), 21, replace = T))
  )

As I have really long names for the variables (required for other function), I am having statements of case_when() that are pretty long and unreadable. Kind of like:

a %>% 
mutate(overall_rob=
       case_when(
         Q1_long_name=="y"& Q2_long_name=="n" & Q3_long_name=="n" & Q5_long_name!="m" ~ "high",
         Q1_long_name=="n"| Q2_long_name=="n" | Q3_long_name=="n" | Q5_long_name!="m" ~ "low",
         TRUE ~ "unclear"  ))

I managed to do it by renaming my variables before using case_when() and then changing them back but it still looks messy (as pointed by TarJae).

a %>% 
rename_with(.cols=matches("^Q"), ~ gsub("^(Q[0-9]).*","\\1", .x))

Thus, I was wondering if there is any way to stream line case_when to use %in% or something similar to specify multiple conditions at once? If not, TarJae's way would definitely be easier

CodePudding user response:

Are you looking for such a solution?

library(dplyr)
a %>% 
  rename_with(~str_extract(., "^[^_] (?=_)")) %>% 
  mutate(overall_rob=
           case_when(
             Q1=="y" & Q2=="n" & Q3=="n" & Q5!="m" ~ "high",
             Q1=="n" | Q2=="n" | Q3=="n" | Q5!="m" ~ "low",
             TRUE ~ "unclear"))

CodePudding user response:

Maybe like this ?

a %>% 
  mutate(case1 = Q1_long_name=="y"& 
           Q2_long_name=="n" & 
           Q3_long_name=="n" & 
           Q5_long_name!="m")%>%
  mutate(case2 = Q1_long_name=="n"| 
           Q2_long_name=="n" | 
           Q3_long_name=="n" | 
           Q5_long_name!="m")
  mutate(overall_rob=
           case_when(
             case1 ~ "high",
             case2 ~ "low",
             TRUE ~ "unclear"  ))
  • Related