Home > Back-end >  remove words based on first letter using stringr
remove words based on first letter using stringr

Time:02-10

I want to remove all words that start with "a" in a string.

Input:

string <- "This is a sentence about nothing."

My attempt:

stringr::str_remove_all(string,"a*\\b")

output I got:

[1] "This is  sentence about nothing."

output I want:

[1] "This is sentence nothing."

I am not sure how to detect based on one letter but perform action(e.g., remove, replace) on the whole word. Any input is appreciated!

CodePudding user response:

The a*\b pattern matches zero or more a chars followed with end of string or a word char. It does not match a word unless it is an a word.

You can use

stringr::str_remove_all(string,"\\ba\\w*")
stringr::str_replace_all(string,"\\ba\\w*", "")
gsub("\\ba\\w*", "", string, perl=TRUE) ## ASCII only letters/digits

where \ba\w* matches a word boundary, a, and then zero or more word chars.

If you also want to remove any whitespaces before the word, add \s* at the start:

stringr::str_remove_all(string,"\\s*\\ba\\w*")
stringr::str_replace_all(string,"\\s*\\ba\\w*", "")
gsub("\\s*\\ba\\w*", "", string, perl=TRUE) ## ASCII only letters/digits/whitespaces

If you need to make sure you only remove natural langugage words consisting only of letters, then you can replace \w with \p{L}:

stringr::str_remove_all(string,"\\s*\\ba\\p{L}*")
stringr::str_replace_all(string,"\\s*\\ba\\p{L}*", "")
gsub("(*UCP)\\s*\\ba\\p{L}*", "", string, perl=TRUE) ## any Uncicode letters/digits/whitespaces
  • Related