Home > OS >  How do I remove entire strings if they contain a matched pattern in R
How do I remove entire strings if they contain a matched pattern in R

Time:05-17

Say I have the following string -

vector <- "this is a string of text containing stuff. something.com [email protected] and other stuff with something.anything"

I would like to remove a string if it contains @ or . , so I would like to remove something.com, [email protected] and something.anything. I do not want to remove stuff because it's the end of a sentence and does not contain .. Ideally I would like to be able to use the %>% pipe to do this.

CodePudding user response:

 gsub(" ?\\w [.@]\\S ", "", vector)

[1] "this is a string of text containing stuff. and other stuff with"

CodePudding user response:

An alternative to the (much more terse/simple) gsub method:

gre <- gregexpr("[^ ] [.@][^ ] ", vector)
regmatches(vector, gre)
# [[1]]
# [1] "something.com"      "[email protected]"     "something.anything"
regmatches(vector, gre) <- ""
vector
# [1] "this is a string of text containing stuff.   and other stuff with "

This has the advantage of being able to replace them arbitrarily. Granted, we're just replacing them here with "", so this is a little overkill, but if you need to change the values somehow (change each substring), then this is a more powerful mechanism.

  • Related