I am trying to make a new variable that depends on a few conditions. Here is an example of data similar to mine:
df <- read.table(text="
color num_1 shape num_2 season num_3 num_4
red 1 triangle 4 Fall 2 8
blue 5 square 4 Summer 8 1
green 3 square 11 Summer 4 1
red 3 circle 2 Summer 1 5
red 7 triangle 6 Winter 7 9
blue 9 square 2 Fall 7 4", header=T)
I want to use mutate and case_when to make a new variable, for example if the color=red and any of the "num" categories are less than 3, the new variable's value would be "yes", or if the color=blue and any of the num categories are less than 5, the new variable would be "yes".
color num_1 shape num_2 season num_3 num_4 new_var
red 1 triangle 4 Fall 2 8 yes
blue 5 square 4 Summer 8 1 yes
blue 9 square 11 Summer 8 7 no
red 3 circle 2 Summer 1 5 yes
red 7 triangle 6 Winter 7 9 no
blue 9 square 2 Fall 7 4 yes
I think I can do something like:
df <-df %>%
mutate(new_var=case_when(
color=="red" & c(2,4,6,7) < 3 ~ "Yes",
color=="blue" & c(2,4,6,7) < 5 ~ "Yes" ,
TRUE~"No"))
But I don't know if it is possible to chose the columns by position like this. Any advice would be great!
CodePudding user response:
You can't use raw column indexes like that, but you can use if_any
df %>%
mutate(
new_var = case_when(
color=="red" & if_any(starts_with("num"), ~ . < 3) ~ "Yes",
color=="blue" & if_any(starts_with("num"), ~ . < 5) ~ "Yes",
TRUE ~ "No")
)
The functions across
, if_any
, and if_all
are all related and allow you to use the tidyselect helpers to look at multiple columns at once.