Home > database >  how to split and filter a string?
how to split and filter a string?

Time:10-14

Consider this character

mystring <- "this, this and this, and this, and this."

I want to split on , or and but I want to get rid of the empty strings. I am puzzled by the fact that my solution below does not work

splitting works OK

> str_split(mystring, regex(',|and'))
[[1]]
[1] "this"   " this " " this"  " "      " this"  " "      " this."

filtering does not work

> str_split(mystring, regex(',|and')) %>% purrr::keep(., function(x) x!= '')
Error: Predicate functions must return a single `TRUE` or `FALSE`, not a logical vector of length 7
Run `rlang::last_error()` to see where the error occurred.

What is the issue here? Thanks!

CodePudding user response:

If we return only blank ("") instead of spaces (" "), then we can make use of nzchar

library(purrr)
library(stringr)
str_split(mystring, regex('\\s*,\\s*|\\s*and\\s*'))[[1]]  %>%
    keep(nzchar)
[1] "this"  "this"  "this"  "this"  "this."

If we are using the OP's code, use trimws before the keep step

str_split(mystring, regex(',|and')) %>%
    pluck(1) %>%
    trimws %>%
    keep(nzchar) 
[1] "this"  "this"  "this"  "this"  "this."

In the OP's code, the keep didn't work because the object from str_split is a list and the element was not extracted. Thus, when we apply the function, it returns multiple TRUE/FALSE for the single list element whereas keep expects a single TRUE/FALSE. Here, we are plucking the list element. In the first solution, extraction was done by [[1]]

  • Related