Home > Mobile >  Using quantifiers in look-arounds (R/stringr)
Using quantifiers in look-arounds (R/stringr)

Time:04-21

I'd like to extract the name John Doe from the following string:

str <- 'Name: |             |John Doe     |'

I can do:

library(stringr)
str_extract(str,'(?<=Name: \\|             \\|).*(?=     \\|)')
[1] "John Doe"

But that involves typing a lot of spaces, and it doesn't work well when the number of spaces is not fixed. But when I try to use a quantifier ( ), I get an error:

str_extract(str,'(?<=Name: \\|  \\|).*(?=  \\|)')
Error in stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) : 
  Look-Behind pattern matches must have a bounded maximum length. (U_REGEX_LOOK_BEHIND_LIMIT, context=`(?<=Name: \|  \|).*(?=  \|)`)

The same goes for other variants:

str_extract(str,'(?<=Name: \\|\\s \\|).*(?=\\s \\|)') 
str_extract(str,'(?<=Name: \\|\\s{1,}\\|).*(?=\\s{1,}\\|)')

Is there a solution to this?

CodePudding user response:

How about: First we remove Name Then we replace all special characters with space and finally str_squish it

Library(stringr)

str_squish(str_replace_all( str_remove(str, "Name"), "[^[:alnum:]]", " "))
[1] "John Doe"

CodePudding user response:

Another solution using base R:

sub("Name: \\|\\s \\|(.*\\S)\\s \\|", "\\1", str)
# [1] "John Doe"

CodePudding user response:

You might also use the \K to keep what is matched so far out of the regex match.

Name: \|\h \|\K.*?(?=\h \|)

Explanation

  • Name: \| match Name: |
  • \h \| Match 1 spaces and |
  • \K Forget what is matched so far
  • .*? Match as least as possible chars
  • (?=\h \|) Positive lookahead, assert 1 more spaces to the right followed by |

See a regex demo and a R demo.

Example

library(stringr)

str <- 'Name: |             |John Doe     |'    
regmatches(str, regexpr("Name: \\|\\h \\|\\K.*?(?=\\h \\|)", str, perl=T))

Output

[1] "John Doe"
  • Related