Say I have the string -
some_string <- "this is a string with some numbers 9639998 21057535 1000 2021 2022"
I would like to remove numeric patterns that are 7, characters long, 8 characters long, and 4 characters long, EXCEPT if it is 1000. So essentially I want the following result -
"this is a string with some numbers 1000"
CodePudding user response:
Use gsub
here with the regex pattern \b(?:\d{7,8}|(?!1000\b)\d{4})\b
:
some_string <- "this is a string with some numbers 9639998 21057535 1000 2021 2022"
output <- gsub("\\b(?:\\d{7,8}|(?!1000\\b)\\d{4})\\b", "", some_string, perl=TRUE)
output
[1] "this is a string with some numbers 1000 "
Actually, a better version, which tidies up loose whitespace, would be this:
some_string <- "this is a string with some numbers 9639998 21057535 1000 2021 2022"
output <- gsub("\\s*(?:\\d{7,8}|(?!1000\\b)\\d{4})\\s*", " ", some_string, perl=TRUE)
output <- gsub("^\\s |\\s $", "", gsub("\\s{2,}", " ", output))
output
[1] "this is a string with some numbers 1000"
CodePudding user response:
A stringr option to keep 1000 and lengths other than 4,7, and 8. (Included one of length 5 in the sample data.)
library(stringr)
"this is a string with some numbers 9639998 21057535 1000 2021 20022 2022" |>
str_remove_all("(?!1000)\\b(\\d{7,8}|\\d{4})\\b") |>
str_squish()
#> [1] "this is a string with some numbers 1000 20022"
Created on 2022-05-17 by the reprex package (v2.0.1)