Home > Enterprise >  Is there a way to remove all periods from a string unless it is a dot in a number in R?
Is there a way to remove all periods from a string unless it is a dot in a number in R?

Time:08-20

I am working on a data that has a text variable in it and I am not good in cleaning texts. I tried my best but it is just hard to find the answer. Let's take this text as example:

"I want. to remove all ... from the text except 5.3 or .5"

I want the output to be:

"I want to remove from the text except 5.3 or .5"

Could someone help me with that?

CodePudding user response:

You could ry:

library(stringr)

str_remove_all("I want to remove all ... from the text except 5.3.", "((?<!\\d)\\.(?!\\d)|\\.$)")
#> [1] "I want to remove all  from the text except 5.3"

There are two parts in an or bracked (...|...), the first (?<!\\d)\\.(?!\\d) says 'remove periods that don't have a number just before and after', and the second \\.$ makes sure it removes the last one (which doesn't get picked up by the first part).

CodePudding user response:

You can try gsub like below

> gsub("(?<=\\D)\\. (?=\\D)", "", "I want. to remove all ... from the text except 5.3 or .5", perl = TRUE)
[1] "I want to remove all  from the text except 5.3 or .5"
  • Related