Home > Net >  in R create multiple new variables based on exclusive condition
in R create multiple new variables based on exclusive condition

Time:02-21

I need help with an seemingly easy task. I would like to create several new variables based on a condition within dplyr::mutate. I can create one new variable using ifelse, but I would like to create several in one step.

let's assume this is my data frame.

df_have <- data.frame(id = 1:10, x = 1:2, y = sample(10, replace = T))

I would like something like this:

df_want <- mutate(df_have,
                  if_else(y<3, var1 = x, var2 = y/x, var3 = T)) 

so the if condition should create three new variables based on on the condition y<3.

I would like to use this with if_else and case_when.

Thanks in forward

CodePudding user response:

You can use case_when inside mutate for each of the new variables individually. Note that the outputs of all case_when cases need to be the same type, so the NA_??_ type may need to be adjusted to the actual data.

set.seed(20)
df_have <- data.frame(id = 1:10, x = 1:2, y = sample(10, replace = T))

df_have %>%
  dplyr::mutate(
    var1 = case_when(y < 3 ~ x,
                     TRUE ~ NA_integer_),
    var2 = case_when(y < 3 ~ y/x,
                     TRUE ~ NA_real_),
    var3 = case_when(y < 3 ~ TRUE,
                     TRUE ~ FALSE)
  )

#   id x y var1 var2  var3
#1   1 1 6   NA   NA FALSE
#2   2 2 8   NA   NA FALSE
#3   3 1 2    1    2  TRUE
#4   4 2 9   NA   NA FALSE
#5   5 1 2    1    2  TRUE
#6   6 2 9   NA   NA FALSE
#7   7 1 3   NA   NA FALSE
#8   8 2 9   NA   NA FALSE
#9   9 1 8   NA   NA FALSE
#10 10 2 2    2    1  TRUE

CodePudding user response:

Something like this to create all the variables at once, although frankly I don't recommended it!:

func <- function(x,y) {
  if(y<3) list("var1"=x,"var2"=y/x,"var3"=T)
  else(list("var1"=NA, "var2"=NA, "var3"=NA))
}
df_have %>%
  rowwise() %>% 
  mutate(vars=list(func(x,y))) %>%
  unnest_wider(col = vars)

Output:

      id     x     y  var1  var2 var3 
   <int> <int> <int> <int> <dbl> <lgl>
 1     1     1     7    NA    NA NA   
 2     2     2     2     2     1 TRUE 
 3     3     1     5    NA    NA NA   
 4     4     2     3    NA    NA NA   
 5     5     1     1     1     1 TRUE 
 6     6     2     3    NA    NA NA   
 7     7     1     5    NA    NA NA   
 8     8     2     4    NA    NA NA   
 9     9     1     2     1     2 TRUE 
10    10     2     4    NA    NA NA  

It is MUCH faster to just do this:

df_have %>% 
  mutate(
    var1=if_else(y<3, x, as.integer(NA)),
    var2=if_else(y<3, y/x, as.double(NA)),
    var3 =if_else(y<3, T, as.logical(NA))
  )
  • Related