Extract a word1 between two words A and B-CodePudding

I want to extract a word1 between two words A and B but I wouldn't want to take spaces before and after word1. And i don't want take up a new word B occurrence.

Example:

library(stringr)
pattern <- "(?<=wA).*(?=wB)"
str1 <- "qzpdjpqz wA    Hello world !                   wB  edjifdjiq"
str2 <- "qzpdjpqz wA    Hello world !                   wB  wB"

str_match_all(str1, pattern)  
str_match_all(str2, pattern)

str11 <- "qzpdjpqz wA    word1                   wB  edjifdjiq\n 


qzpdjpqz wA                    word2                   wB 
wB\n

qzpdjpqz gregegt wA    word3                   wB  wB\n rsgeef vfsfeqz 

wA    word4 wB                 "

desired result -> "Hello world !"

CodePudding user response：

Here's a base R option with sub removing whitespaces after 'wA' and before 'wB'.

str1 <- "qzpdjpqz wA    Hello world !                   wB  edjifdjiq"
str2 <- "qzpdjpqz wA    Hello world !                   wB  wB"

sub('.*wA\\s (.*?)\\s wB.*', '\\1', c(str1, str2)) 
#[1] "Hello world !" "Hello world !"

CodePudding user response：

If the number of whitespaces is known, count them.

stringr::str_match_all(list(str1, str2), '(?<=wA\\s{4}).*(?=\\s{19}wB. )')
# [[1]]
# [,1]           
# [1,] "Hello world !"
# 
# [[2]]
# [,1]           
# [1,] "Hello world !"

Otherwise try regmatches(regexpr()).

sapply(list(str1, str2), \(x) 
       trimws(regmatches(x, regexpr('(?<=wA).*(?=wB. )', x, perl=TRUE))))
# [1] "Hello world !" "Hello world !"