Home > Back-end >  Regex Unexpected Behavior with optional groups
Regex Unexpected Behavior with optional groups

Time:06-23

So I have this expression

#(?<category>. )(?:\/(?<id>. ))?

Which is supposed to capture the foo of #foo or capture both foo and bar of #foo/bar

However, it seems to match the entire rest of the string as foo and capture it RegexTester bad

Removing the last functions as expected, but, of course, the last part is no longer optional RegexTester good

I don't understand why this happens. (This still happens without capture groups too)

CodePudding user response:

It's because . is greedy, and optional groups don't have to match.

First, #(?<category>. ) consumes the whole string. Then there's nothing for (?:\/(?<id>. ))? to match, but it's not required to match anything, so the whole expression still succeeds.

There isn't a general technique to rewrite every regex that suffers this issue, but there is a general approach to preventing it: make sure you write the preceding group to stop before the optional group would be matched. In this instance, since you want a backslash in the "id" group, you can have "category" not match backslashes:

#(?<category>[^/] )(?:\/(?<id>. ))?

You might be tempted to use a lazy modifier, but this still won't work, as it will then match the shortest possible substring ("f") and the optional group will still have nothing to match.

  • Related