Home > Net >  Regular expression in R for finding spaces ONLY after a known word
Regular expression in R for finding spaces ONLY after a known word

Time:08-11

I have several character vectors like these in R:

a <- "text text NOTE      3/1"
b <- "text NOTE   4.3%"

All of them have a known word - NOTE - which is followed by a variate number of spaces and other characters.

What I want to do is to find the spaces between NOTE and other characters in string, and then replace each space with another character - say @

The desired output would be:

"text text NOTE@@@@@@3/1"
"text NOTE@@@4.3%"

So far I could only find the regular expression that will find NOTE and the spaces that follow it.

c <- gsub("NOTE\\s ", "@", a)
c
[1] "@3/1"

CodePudding user response:

Another option using [[:space:]] like this:

a <- "NOTE      3/1"
b <- "NOTE   4.3%"

lapply(list(a,b), function(x) gsub("[[:space:]]", "@", x))
#> [[1]]
#> [1] "NOTE@@@@@@3/1"
#> 
#> [[2]]
#> [1] "NOTE@@@4.3%"

Created on 2022-08-10 by the reprex package (v2.0.1)

CodePudding user response:

You can use

gsub("(?:\\G(?!^)|NOTE)\\K\\s", "@", a, perl=TRUE)

See the regex demo and the R demo.

a <- "text text NOTE      3/1"
b <- "text NOTE   4.3%"
gsub("(?:\\G(?!^)|NOTE)\\K\\s", "@", a, perl=TRUE)
# => [1] "text text NOTE@@@@@@3/1"
gsub("(?:\\G(?!^)|NOTE)\\K\\s", "@", b, perl=TRUE)
# => [1] "text NOTE@@@4.3%"

Details:

  • (?:\G(?!^)|NOTE) - either the end of the previous successful match or NOTE
  • \K - match reset operator that discards the text matched so far
  • \s - a whitespace char.

Here is a stringr version where the whitespaces matched after NOTE are each replaced with a @ char in the function(x) str_replace_all(x, "\\s", "@") callback function:

library(stringr)
stringr::str_replace_all(a, "NOTE\\s ", function(x) str_replace_all(x, "\\s", "@"))
# => [1] "NOTE@@@@@@3/1"
stringr::str_replace_all(b, "NOTE\\s ", function(x) str_replace_all(x, "\\s", "@"))
# => [1] "NOTE@@@4.3%"
  • Related