How to select the number in a text? (R)-CodePudding

How to select the number in a text?

I want to convert the Latin number and English numbers in the text. For example, in the text "……one telephone……". I want to change the English number "one" into "1", but I do not want to change "telephone" into "teleph1".

Is it right to select only the number with a space ahead of the word and a space after it? How to do it?

CodePudding user response：

Here is one way:

gsub(" one ", " 1 ", ".. one telephone ..")

You may need more rules than "space before and space after" (e.g. punctuation). Here is an example to handle blank space or punctuation before "one"

gsub("\\([[:punct:]]|[[:blank:]]\\)one ", "\11 ", "..one telephone ..")

You can do something similar after "one". The \1 in the second argument refers to whatever is matched inside \\( ... \\) in the first argument.

Check the documentation of gsub to learn more about regular expressions.

CodePudding user response：

To avoid replacing parts of other words into numbers you can include word boundaries in the search pattern. Some functions have a dedicated option for this but generally you can just use the special character \\b to indicate a word boundary as long as the function supports regular expressions.

For example, \\bone\\b will only match "one" if it is not part of another word. That way you can apply it to your character string "……one telephone……" without having to rely on spaces as delimiter between words.

With the stringr package (part of the Tidyverse), the replacement might look like this:

# define test string
x <- "……one telephone……"

# define dictionary for replacements with \\b indicating word boundaries
dict <- c(
  "\\bone\\b" = "1",
  "\\btwo\\b" = "2",
  "\\bthree\\b" = "3"
)

# replace matches in x
stringr::str_replace_all(x, dict)
#> [1] "……1 telephone……"

^{Created on 2022-11-11 with reprex v2.0.2}