I want the function to return the string that follows the below condition.
- after "def"
- in the parentheses right before the first %ile after "def"
So the desirable output is "4", not "5". So far, I was able to extract "2)(3)(4". If I change the function to str_extract_all, the output became "2)(3)(4" and "5" . I couldn't figure out how to fix this problem. Thanks!
x <- "abc(0)(1)%ile, def(2)(3)(4)%ile(5)%ile"
string.after.match <- str_match(string = x,
pattern = "(?<=def)(.*)")[1, 1]
parentheses.value <- str_extract(string.after.match, # get value in ()
"(?<=\\()(.*?)(?=\\)\\%ile)")
parentheses.value
Take the
CodePudding user response:
sub(".*?def.*?(\\d)\\)%ile.*", "\\1", x)
[1] "4"
CodePudding user response:
Here is a one liner that will do the trick using gsub()
gsub(".*def.*(\\d )\\)%ile.*%ile", "\\1", x, perl = TRUE)
Here's an approach that will work with any number of "%ile"s. Based on str_split()
x <- "abc(0)(1)%ile, def(2)(3)(4)%ile(5)%ile(9)%ile"
x %>%
str_split("def", simplify = TRUE) %>%
subset(TRUE, 2) %>%
str_split("%ile", simplify = TRUE) %>%
subset(TRUE, 1) %>%
str_replace(".*(\\d )\\)$", "\\1")
CodePudding user response:
You can use
x <- "abc(0)(1)%ile, def(2)(3)(4)%ile(5)%ile"
library(stringr)
result <- str_match(x, "\\bdef(?:\\((\\d )\\)) %ile")
result[,2]
See the R demo online and the regex demo.
Details:
\b
- word boundarydef
- adef
string(?:\((\d )\))
- zero or more occurrences of(
one or more digits (captured into Group 1))
and the last one captured is saved in Group 1%ile
- an%ile
string.