I want to extract a word1 between two words A and B but I wouldn't want to take spaces before and after word1. And i don't want take up a new word B occurrence.
Example:
library(stringr)
pattern <- "(?<=wA).*(?=wB)"
str1 <- "qzpdjpqz wA Hello world ! wB edjifdjiq"
str2 <- "qzpdjpqz wA Hello world ! wB wB"
str_match_all(str1, pattern)
str_match_all(str2, pattern)
str11 <- "qzpdjpqz wA word1 wB edjifdjiq\n
qzpdjpqz wA word2 wB
wB\n
qzpdjpqz gregegt wA word3 wB wB\n rsgeef vfsfeqz
wA word4 wB "
desired result -> "Hello world !"
CodePudding user response:
Here's a base R option with sub
removing whitespaces after 'wA'
and before 'wB'
.
str1 <- "qzpdjpqz wA Hello world ! wB edjifdjiq"
str2 <- "qzpdjpqz wA Hello world ! wB wB"
sub('.*wA\\s (.*?)\\s wB.*', '\\1', c(str1, str2))
#[1] "Hello world !" "Hello world !"
CodePudding user response:
If the number of whitespaces is known, count them.
stringr::str_match_all(list(str1, str2), '(?<=wA\\s{4}).*(?=\\s{19}wB. )')
# [[1]]
# [,1]
# [1,] "Hello world !"
#
# [[2]]
# [,1]
# [1,] "Hello world !"
Otherwise try regmatches(regexpr())
.
sapply(list(str1, str2), \(x)
trimws(regmatches(x, regexpr('(?<=wA).*(?=wB. )', x, perl=TRUE))))
# [1] "Hello world !" "Hello world !"