I want to remove all words that start with "a" in a string.
Input:
string <- "This is a sentence about nothing."
My attempt:
stringr::str_remove_all(string,"a*\\b")
output I got:
[1] "This is sentence about nothing."
output I want:
[1] "This is sentence nothing."
I am not sure how to detect based on one letter but perform action(e.g., remove, replace) on the whole word. Any input is appreciated!
CodePudding user response:
The a*\b
pattern matches zero or more a
chars followed with end of string or a word char. It does not match a word unless it is an a
word.
You can use
stringr::str_remove_all(string,"\\ba\\w*")
stringr::str_replace_all(string,"\\ba\\w*", "")
gsub("\\ba\\w*", "", string, perl=TRUE) ## ASCII only letters/digits
where \ba\w*
matches a word boundary, a
, and then zero or more word chars.
If you also want to remove any whitespaces before the word, add \s*
at the start:
stringr::str_remove_all(string,"\\s*\\ba\\w*")
stringr::str_replace_all(string,"\\s*\\ba\\w*", "")
gsub("\\s*\\ba\\w*", "", string, perl=TRUE) ## ASCII only letters/digits/whitespaces
If you need to make sure you only remove natural langugage words consisting only of letters, then you can replace \w
with \p{L}
:
stringr::str_remove_all(string,"\\s*\\ba\\p{L}*")
stringr::str_replace_all(string,"\\s*\\ba\\p{L}*", "")
gsub("(*UCP)\\s*\\ba\\p{L}*", "", string, perl=TRUE) ## any Uncicode letters/digits/whitespaces