I'm struggling to find an answer to the following problem.
I want to search a column in a data.frame by a vector. Upon finding a match I then wish to utilise the element of the 'search vector' to create a new element of a new column. See the reproducible example below please.
colour <- c('red', 'yellow')
a <- c('violet', 'red', 'taupe', 'blue', 'yellow_a', 'yellow_b', 'blue_a', 'red_c')
b <- c('non', 'prim', 'non', 'prim', 'prim', 'prim', 'prim', 'prim')
c <- c(1, 2, 3, 4, 5, 6, 7, 8)
df <- data.frame(a, b, c)
I've tried the following:
df_clean <- df %>% mutate(d = if_else(str_detect(a, colour), colour, NA_character_))
The Output:
Problem: Looking at help files I'm unable to output greater than 1 from an 'if_else', I'm receiving the following:
Error: Problem with
mutate()
columnd
. ℹd = if_else(rep(str_detect(a, colour), length(colour)), colour, NA_character_)
. xtrue
must be length 16 (length ofcondition
) or one, not 2.
I'm looking to achieve:
a <- c('violet', 'red', 'taupe', 'blue', 'yellow_a', 'yellow_b', 'blue_a', 'red_c')
b <- c('non', 'prim', 'non', 'prim', 'prim', 'prim', 'prim', 'prim')
c <- c(1, 2, 3, 4, 5, 6, 7, 8)
d <- c(NA_character_, 'red', NA_character_, NA_character_, 'yellow', 'yellow', NA_character_, 'red')
df_clean <- data.frame(a, b, c, d)
Requirements:
If you could help me fix this or find an alternative solution I would be most grateful, I'm unable to bridge the gap. I'm missing something potentially obvious?
Any help would be greatly appreciated!
Many Thanks
CodePudding user response:
Potential solution with str_extract
from the stringr
package.
colour <- c('red', 'yellow')
a <- c('violet', 'red', 'taupe', 'blue', 'yellow_a', 'yellow_b', 'blue_a', 'red_c')
b <- c('non', 'prim', 'non', 'prim', 'prim', 'prim', 'prim', 'prim')
c <- c(1, 2, 3, 4, 5, 6, 7, 8)
df <- data.frame(a, b, c)
colour_str <- paste(colour, collapse='|')
df |>
mutate(d = str_extract(a, colour_str))
Output:
a b c d
1 violet non 1 <NA>
2 red prim 2 red
3 taupe non 3 <NA>
4 blue prim 4 <NA>
5 yellow_a prim 5 yellow
6 yellow_b prim 6 yellow
7 blue_a prim 7 <NA>
8 red_c prim 8 red
CodePudding user response:
You were very close from the solution :
a <- c('violet', 'red', 'taupe', 'blue', 'yellow_a', 'yellow_b', 'blue_a', 'red_c')
b <- c('non', 'prim', 'non', 'prim', 'prim', 'prim', 'prim', 'prim')
c <- c(1, 2, 3, 4, 5, 6, 7, 8)
d <- c(NA_character_, 'red', NA_character_, NA_character_, 'yellow', 'yellow', NA_character_, 'red')
my_df <- data.frame(a, b, c, d)
pattern <- "red|yellow"
my_df$test <- ifelse(test = str_detect(string = my_df$a, pattern = pattern) == TRUE, yes = str_extract(string = my_df$a, pattern = pattern), no = NA)