How can I remove the URL (that contains "www") but not any other word that contains "w"?
This is my R code
textz <- "Please don't w8 notification from Www.example.com, just call the office during weekdays"
# URL without https
text <- gsub("(W|w|W|w)(.)(\\S*)", "", textz)
text
# output
[1] "Please don't notification from just call the office during "
How can I maintain the word "w8" and "weekdays"? I just want to remove the URL in this context. Thank you in advance!
CodePudding user response:
Maybe
textz <- "Please don't w8 notification from Www.example.com, just call the office during weekdays"
# URL without https
text <- gsub("[wW]{3}\\S ", "", textz)
text
#"Please don't w8 notification from just call the office during weekdays"
This regular expression "[wW]{3}\S " means:
[wW] Look for w or W,
{3} exactly 3 of the previous character.
\S one or more non-spaces.
CodePudding user response:
Maybe store the characters in a vector then only access the items after the first three items in the vector since the first three will always be www.
Here is how you would split the string into individual characters to store in a vector.
Determine all characters present in a vector of strings
If course you would have to split the string up so the website URL is separate from the rest