Home > Software engineering >  extract the shortest and first encounter match between two strings in R
extract the shortest and first encounter match between two strings in R

Time:11-24

I want the function to return the string that follows the below condition.

  1. after "def"
  2. in the parentheses right before the first %ile after "def"

So the desirable output is "4", not "5". So far, I was able to extract "2)(3)(4". If I change the function to str_extract_all, the output became "2)(3)(4" and "5" . I couldn't figure out how to fix this problem. Thanks!

x <- "abc(0)(1)%ile, def(2)(3)(4)%ile(5)%ile"

string.after.match <- str_match(string = x,     
                                pattern = "(?<=def)(.*)")[1, 1]

parentheses.value <- str_extract(string.after.match,         # get value in ()
                                 "(?<=\\()(.*?)(?=\\)\\%ile)")

parentheses.value

Take the

CodePudding user response:

 sub(".*?def.*?(\\d)\\)%ile.*", "\\1", x)
[1] "4"

CodePudding user response:

Here is a one liner that will do the trick using gsub()

gsub(".*def.*(\\d )\\)%ile.*%ile", "\\1", x, perl = TRUE)

Here's an approach that will work with any number of "%ile"s. Based on str_split()

x <- "abc(0)(1)%ile, def(2)(3)(4)%ile(5)%ile(9)%ile"
x %>% 
  str_split("def", simplify = TRUE) %>% 
  subset(TRUE, 2) %>% 
  str_split("%ile", simplify = TRUE) %>% 
  subset(TRUE, 1) %>% 
  str_replace(".*(\\d )\\)$", "\\1")

CodePudding user response:

You can use

x <- "abc(0)(1)%ile, def(2)(3)(4)%ile(5)%ile"
library(stringr)
result <- str_match(x, "\\bdef(?:\\((\\d )\\)) %ile")
result[,2]

See the R demo online and the regex demo.

Details:

  • \b - word boundary
  • def - a def string
  • (?:\((\d )\)) - zero or more occurrences of ( one or more digits (captured into Group 1) ) and the last one captured is saved in Group 1
  • %ile - an %ile string.
  • Related