Consider this character
mystring <- "this, this and this, and this, and this."
I want to split on ,
or and
but I want to get rid of the empty strings. I am puzzled by the fact that my solution below does not work
splitting works OK
> str_split(mystring, regex(',|and'))
[[1]]
[1] "this" " this " " this" " " " this" " " " this."
filtering does not work
> str_split(mystring, regex(',|and')) %>% purrr::keep(., function(x) x!= '')
Error: Predicate functions must return a single `TRUE` or `FALSE`, not a logical vector of length 7
Run `rlang::last_error()` to see where the error occurred.
What is the issue here? Thanks!
CodePudding user response:
If we return only blank (""
) instead of spaces (" "
), then we can make use of nzchar
library(purrr)
library(stringr)
str_split(mystring, regex('\\s*,\\s*|\\s*and\\s*'))[[1]] %>%
keep(nzchar)
[1] "this" "this" "this" "this" "this."
If we are using the OP's code, use trimws
before the keep
step
str_split(mystring, regex(',|and')) %>%
pluck(1) %>%
trimws %>%
keep(nzchar)
[1] "this" "this" "this" "this" "this."
In the OP's code, the keep
didn't work because the object from str_split
is a list
and the element was not extracted. Thus, when we apply the function, it returns multiple TRUE/FALSE for the single list
element whereas keep
expects a single TRUE/FALSE. Here, we are pluck
ing the list element. In the first solution, extraction was done by [[1]]