Home > Mobile >  Repeated capture group and lazy issue
Repeated capture group and lazy issue

Time:08-31

I want to capture "foo" and any occurrence of "bar". Plus, I need to ignore any string between them and bar is optional.

Example text:

foo ignoreme barbarbar
foo ignoreme bar
foo ignoreme 
foo something abcbar

Expected:

foo barbarbar
foo bar
foo
foo bar

A tried with this regex :

(foo)(?:.*)((?:bar)*)

But the .* capture all the end of the string:

foo
foo
foo
foo

So I changed it to lazy to stop the capture:

(foo)(?:.*?)((?:bar)*)

I almost got the same result, only foo is captured.

It seems it stop the capture to early, however, this almost works:

(foo)(?:.*?)((?:bar) )

foo barbarbar
foo bar
<miss third line>
foo bar

But it misses the third line because the pattern "bar" must appear one time. Example here https://regex101.com/r/NIUPew/1

Any idea from a regex guru? Thanks!

CodePudding user response:

You can move the repeated capturing group into the non-capturing group while making that group optional:

(foo)(?:.*?((?:bar) ))?

See the regex demo.

Details:

  • (foo) - Group 1: foo
  • (?:.*?((?:bar) ))? - an optional non-capturing group that will be tried at least once (because ? is a greedy quantifier matching the quantified pattern one or zero times) to match
    • .*? - any zero or more chars other than line break chars as few as possible
    • ((?:bar) ) - Group 2: one or more bar char sequences.

CodePudding user response:

You can search using this regex:

(\bfoo) .*?(?: \w*?((?:bar) )\w*)?$

and replace with:

$1 $2

Updated RegEx Demo

RegEx Breakup:

  • (\bfoo): 1st capture group to match foo after a word boundary
  • .*?: Followed by a space any text (lazy match)
  • (?: : Start non-capture group with a space
    • \w*?: Match 0 or more word chars (lazy)
    • ((?:bar) ): Match 1 repetitions of bar in capture group #2
    • \w*: Match 0 or more word chars
  • )?: End non-capture group. ? makes this optional match
  • $: End

PS: Regex can be shortened to (\bfoo) .*?(?:((?:bar) )\w*)?$ but it will be bit more slow.

  • Related