I have a vector with delimiters and I want to generate a vector of the same length with boolean values based on whether or not one of the delimited values contains what I am after. I cannot find a way to do this neatly in vector-based logic. As an example:
x <- c('a', 'a; b', 'ab; c', 'b; c', 'c; a', 'c')
Using some magic asking whether 'a' %in% x, I want to get the vector:
TRUE, TRUE, FALSE, FALSE, TRUE, FALSE
I initially tried the following:
'a' %in% trimws(strsplit(x, ';'))
But this unexpectedly collapses the entire list and returns TRUE, rather than a vector, since one of the elements in x is 'a'. Is there a way to get the vector I am looking for without rewriting the code into a for-loop?
CodePudding user response:
Update: To consider white spaces:
library(stringr)
x <- str_replace_all(string=x, pattern=" ", repl="")
x
[1] "a" "a;b" "ab;c" "b;c" "c;a" "c"
str_detect(x, 'a$|a;')
[1] TRUE TRUE FALSE FALSE TRUE FALSE
First answer:
If you want to use str_detect
we have to account on a
delimiter ;
:
library(stringr)
str_detect(x, 'a$|a;')
[1] TRUE TRUE FALSE FALSE TRUE FALSE
CodePudding user response:
Base R:
grepl("a", x)
or (when you want to use explicitly %in%):
sapply(strsplit(x,""), function(x){ "a" %in% x})
When working with strings and letters I always use the great library stringr
library(stringr)
x <- c('a', 'a; b', 'ab; c', 'b; c', 'c; a', 'c')
str_detect(x, "a")
CodePudding user response:
You can read each item separately with scan
, trim leading and trailing WS as you attempted, and test each resulting character vector in turn with:
sapply(x, function(x){"a" %in% trimws( scan( text=x, what="",sep=";", quiet=TRUE))})
a a; b ab; c b; c c; a c
TRUE TRUE FALSE FALSE TRUE FALSE
The top row of the result is just the names and would not affect a logical test that depended on this result. There is an unname
function if needed.
CodePudding user response:
If you would like to use %in%
, here is a base R option
> mapply(`%in%`, list("a"), strsplit(x, ";\\s "))
[1] TRUE TRUE FALSE FALSE TRUE FALSE
A more efficient way might be using grepl
like below
> grepl("\\ba\\b",x)
[1] TRUE TRUE FALSE FALSE TRUE FALSE