Home > Software design >  How to make the middle capture group work when surrounded by wildcards?
How to make the middle capture group work when surrounded by wildcards?

Time:11-19

I use Regex quite a bit. I'm no master, but I've surprised myself with how difficult this has been.


We have a Regex string like this:

^(?:remind me ).*? (to|that|about|its|it's)? ?(.*)$

I want it to match both of the following strings, and assign some value to the first capture group.

  • remind me in 24 hours test
  • remind me in 24 hours to test

Assigning this little "to" to the first capture group is proving very difficult.


I could work-around this by doing two passes like below and then checking if the result is null or not, but that seems like madness, so I'm hoping to learn a better approach to this.

const regex1 = /^(?:remind me ).*? (to|that|about|its|it's)? ?(.*)$/i
const regex2 = /(to|that|about|its|it's) ?(.*)$/i

const matches1 = 'remind me in 24 hours to test'.match(regex1)[2]
const matches2 = matches1.match(regex2)

console.log(matches2)
// String1 output: null
// String2 output: [ 'to test', 'to', 'test', index: 9, input: '24 hours to test', groups: undefined ]

On related questions:

I've seen numerous other questions about this - but none of the "solutions" seem applicable here, as most of the answers are tailored to the user's specific issue, and I haven't been able to figure out how to fix our issue using them as a reference.

I read enter image description here

CodePudding user response:

It works if you remove the optional quantifier from the first capturing group and put .*? together with the capture group into another non-capturing group and make this outer group optional:

^remind me  (?:.*?\b(to|that|about|its|it's)\b *)?(.*)$

See this demo at regex101 (I also did some little changes like adding word boundaries, change quantifiers for variable space and remove the non-capture group at start, that looks unneeded)


To understand why this works, first have a look at the simple pattern (a)? and how this results in one capture of a and three empty matches in abc while getting four empty matches in e.g. xyz.

Simplifying your current pattern to e.g. ^a.*?(b)?(.*) investigate this at the regex101 debugger and click the matches tab on the left side. For the string abc the regex parser first matches a. The next character b matches the optional group and the capture succeeds. Using the same pattern on another string acbc, after matching the first a the next character is a c. Because b is optional it "fits in" between a and the adjacent c (click around step 7 at match 2) and won't get captured.

But refactoring this pattern to ^a(?:.*?(b))?(.*) and now looking into the debugger (watch steps 3 to 12) you can see that at the same position after the first a the grouped (?:.*?(b))? part fits in here for both test strings. The first group captures the substring before proceeding in the pattern.


With your current pattern there are even some strings that will the first group let capture (demo).

  • Related