Home > Blockchain >  how to remove only "www" from URL not any other words that contain "w" character
how to remove only "www" from URL not any other words that contain "w" character

Time:09-22

How can I remove the URL (that contains "www") but not any other word that contains "w"?

This is my R code

textz <- "Please don't w8 notification from Www.example.com, just call the office during weekdays"

# URL without https
text <- gsub("(W|w|W|w)(.)(\\S*)", "", textz) 
text

# output
[1] "Please don't  notification from  just call the office during "

How can I maintain the word "w8" and "weekdays"? I just want to remove the URL in this context. Thank you in advance!

CodePudding user response:

Maybe

textz <- "Please don't w8 notification from Www.example.com, just call the office during weekdays"

# URL without https
text <- gsub("[wW]{3}\\S ", "", textz) 
text

#"Please don't w8 notification from  just call the office during weekdays"

This regular expression "[wW]{3}\S " means:
[wW] Look for w or W,
{3} exactly 3 of the previous character.
\S one or more non-spaces.

CodePudding user response:

Maybe store the characters in a vector then only access the items after the first three items in the vector since the first three will always be www.

Here is how you would split the string into individual characters to store in a vector.

Determine all characters present in a vector of strings

If course you would have to split the string up so the website URL is separate from the rest

  •  Tags:  
  • r
  • Related