I have a list of functions I want to apply to a string consecutively, changing the string. For example, a list of regular expressions I want to remove from a string, e.g.
to_remove = c('a','b')
original = 'abcabd'
In this case, I could use a simple regular expression, e.g. 'a|b'.
library(stringr)
str_remove( original, paste0( to_remove, collapse='|'))
The actual situation is more complex than this, and the regular expression gets a little hairy. Also, I am curious how to do this in a proper R way.
Another way to phrase the question is, "How can I implement the following for loop using a vector approach?"
for( rem in to_remove) {
original = str_remove( original, rem )
}
'
CodePudding user response:
stringi
offers vectorized string replacement. The replacement
argument can take on a vector of length 1, ""
, or a vector of the same length.
to_remove = c('a','b')
original = 'abcabd'
stringi::stri_replace_all_fixed(
str = original,
pattern = to_remove,
replacement = "",
vectorize_all = F
)
#> [1] "cd"
For repeated application of patterns, perhaps use an alternative pattern, per Stéphane's comment, replace_all_*
with vectorize = FALSE
is recursive, see for example the output of
stri_replace_all_regex(
"abc",
pattern = c("^a", "^b"),
replacement = "",
vectorise_all = FALSE
)
#> [1] "c"
CodePudding user response:
I don't think that str_remove_all
can take a list of inputs, but you can just use str_replace_all
with a named vector to circumvent the issue:
str_replace_all("abcdabcd", c("^a" = "",
"b" = "",
"c" = "",
"d$" = ""))
[1] "da"
This solution works with regular expressions as well. The str_remove
functions are just an alias for str_replace(x, "")
, so there is no loss in computational speed.
Alternatively, you can set the names of an empty vector as the regular expressions as such:
to_remove <- c('a','b', )
empty <- rep("", length(to_remove))
names(empty) <- to_remove
str_replace_all("abcd", empty)
Or you can sequentially pipe the expression into a sequence of str_remove
calls as so:
"abcd" %>%
str_remove_all("d$") %>%
str_remove_all("a")