So I have this expression
#(?<category>. )(?:\/(?<id>. ))?
Which is supposed to capture the foo
of #foo
or capture both foo
and bar
of #foo/bar
However, it seems to match the entire rest of the string as foo
and capture it
Removing the last functions as expected, but, of course, the last part is no longer optional
I don't understand why this happens. (This still happens without capture groups too)
CodePudding user response:
It's because .
is greedy, and optional groups don't have to match.
First, #(?<category>. )
consumes the whole string. Then there's nothing for (?:\/(?<id>. ))?
to match, but it's not required to match anything, so the whole expression still succeeds.
There isn't a general technique to rewrite every regex that suffers this issue, but there is a general approach to preventing it: make sure you write the preceding group to stop before the optional group would be matched. In this instance, since you want a backslash in the "id" group, you can have "category" not match backslashes:
#(?<category>[^/] )(?:\/(?<id>. ))?
You might be tempted to use a lazy modifier, but this still won't work, as it will then match the shortest possible substring ("f") and the optional group will still have nothing to match.