I need to remove punctuation from the text:
data <- "Type the command AT&W enter. in order to save the new protocol on modem;"
gsub('[[:punct:] ] ',' ',data)
This solution gives the result
[1] "Type the command AT W enter in order to save the new protocol on modem "
This is not the desired result because I would like to save &
, hence:
[1] "Type the command AT&W enter in order to save the new protocol on modem "
CodePudding user response:
You could try a user defined regex consisting of anything that is not an $ or an alpha numeric:
data <- "Type the command AT&W enter. in order to save the new protocol on modem;"
gsub('[^&[:alnum:] ] ',' ',data)
CodePudding user response:
What about doing the inverse? i.e. replacing everything that is not a letter, a digit or a &
with an empty string:
gsub("[^[:alnum:][:space:]&]", "", data)
# [1] "Type the command AT&W enter in order to save the new protocol on modem"
CodePudding user response:
Here's another regex, which literally means "find all punctuations except &
".
gsub("[^\\P{P}&]", "", data, perl = T)
[1] "Type the command AT&W enter in order to save the new protocol on modem"