Home > Back-end >  Remove punctuation from text (except the symbol &)
Remove punctuation from text (except the symbol &)

Time:03-29

I need to remove punctuation from the text:

 data <- "Type the command AT&W enter. in order to save the new protocol on modem;"
 gsub('[[:punct:] ] ',' ',data)

This solution gives the result

[1] "Type the command AT W enter in order to save the new protocol on modem "

This is not the desired result because I would like to save &, hence:

[1] "Type the command AT&W enter in order to save the new protocol on modem "

CodePudding user response:

You could try a user defined regex consisting of anything that is not an $ or an alpha numeric:

data <- "Type the command AT&W enter. in order to save the new protocol on modem;"

gsub('[^&[:alnum:] ] ',' ',data)

CodePudding user response:

What about doing the inverse? i.e. replacing everything that is not a letter, a digit or a & with an empty string:

gsub("[^[:alnum:][:space:]&]", "", data)
# [1] "Type the command AT&W enter in order to save the new protocol on modem"

CodePudding user response:

Here's another regex, which literally means "find all punctuations except &".

gsub("[^\\P{P}&]", "", data, perl = T)
[1] "Type the command AT&W enter in order to save the new protocol on modem"
  • Related