Home > Software design >  Regex capture optional groups
Regex capture optional groups

Time:11-21

I'm trying capture 2 groups of numbers, where each group is optional and should only be captured if contains numbers. Here is a list of all valid combinations that it supposed to match:

  1. 123(456)
  2. 123
  3. (456)
  4. abc(456)
  5. 123(efg)

And these are not valid combinations and should not be matched:

  1. abc(efg)
  2. abc
  3. (efg)

However, my regex fails on #4 and #5 combinations even though they contain numbers.

const list = ["123(456)", "123", "(456)", "abc(456)", "123(def)", "abc(def)", "abc", "(def)"];
const regex = /^(?:(\d ))?(?:\((\d )\))?$/;

list.map((a,i) => console.log(i 1 ". ", a   "=>".padStart(11-a.length," "), JSON.stringify((a.match(regex)||[]).slice(1))));
.as-console-wrapper{top:0;max-height:unset!important;overflow:auto!important;}

So, the question is why when used ? behind a group, it doesn't "skip" that group if nothing matched?

P.S. With this regex it also captures #4, but not #5: /(?:^|(\d )?)(?:\((\d )\))?$/

CodePudding user response:

A solution to what you're looking for can be done with lookahead, see:

(?=^\d (?:\(|$))(\d )|(?=\d \)$)(\d )

Rough translation: a number from the start ending with a bracket (or end of line) OR a number in brackets somewhere in the text

To answer question on optional captured groups

Yes, if a group is marked optional e.g. (A*)? it does make the whole group optional. In your case, it is simply a case of the regex not matching - even if the optional part isn't there (verify with the help of a regex debugger)

CodePudding user response:

@WiktorStribiżew and @akash had good ideas, but they are based on global flag, which requires additional loop to gather all the matches.

For now, I come up with this regex, which matches anything, but it captures only what I need.

const list = ["123(456)", "123", "(456)", "abc(456)", "123(def)", "abc(def)", "abc", "(def)"];
const regex = /(?:(\d )|^|[^(] ) ?(?:\((?:(\d )|\D*)\)|$) ?/;

list.map((a,i) => console.log(i 1 ". ", a   "=>".padStart(11-a.length," "), JSON.stringify((a.match(regex)||[]).slice(1))));
.as-console-wrapper{top:0;max-height:unset!important;overflow:auto!important;}

CodePudding user response:

Here an idea without global flag and supposed to only match the needed items:

^(?=\D*\d)(\d )?\D*(?:\((\d*)\))?\D*$
  • ^(?=\D*\d) The lookahead at ^ start checks for at least a digit
  • (\d )? capturing the digits to the optional first group
  • \D* followed by any amount of non digits
  • (?:\((\d*)\))? digits in parentheses to optional second group
  • \D*$ matching any amount of \D non digits up to the $ end

See your JS demo or a demo at regex101 (the [^\d\n] only for multiline demo)

  • Related