Home > database >  Optional group in regular expression
Optional group in regular expression

Time:05-08

I have lines I'm trying to match that could be:

ACDNT: BLAHBLAH COUNTY, NC
ACDNT: BLAHBLAH COUNTY, NC PERS INJ
ACDNT: BLAHBLAH COUNTY, NC CMV
ACDNT: SOMEWHERE ELSE

So I want a regular expression that matches "ACDNT: ", a location with or without the NC county, and then either nothing, "PERS INJ", or "CMV". I want to capture the location and the 'extra' (PERS INJ or CMV) in groups.

I'm trying:

(ACDNT:  )(.*)(  (CMV|PERS INJ))?

with the test string:

ACDNT: SOMEWHERE PERS INJ

and regex101 (with the Java option) matches 'SOMEWHERE PERS INJ' as group 2. I was expecting "PERS INJ" to be in its own group.

I thought the trailing question mark would make the group enclosing the space and the last text optional. How would alter the regular expression to do that?

To summarize, I want to match the location (whether it's an NC county or not) as its own group, then have an optional group that has one of the two 'extra' strings if they're there.

("a programmer had a problem and decided to solve it with regular expressions. Now he has two problems...")

CodePudding user response:

Try (ACDNT: )(.*?)( (CMV|PERS INJ))?$

Your problem is that .* is greedy and consumes the entire rest of the string--that's why you're seeing "SOMEWHERE PERS INJ" all in the same group. I changed * to *? to make it reluctant instead of greedy, and I added $ at the end to force the matcher to consider the whole string.

There are still some caveats. Note that an input consisting of "ACDNT: " followed by any string will still be a successful match. You could help address this by being more specific with what's allowed for the location instead of .*.

  • Related