Home > Software engineering >  Regex pattern for multiple optional capture groups
Regex pattern for multiple optional capture groups

Time:10-05

I would like to extract some data from the text which has following patterns.

text1
text1|text2
text1|text2[text3]
text1|text2[text3] text4
(text1|text2[text3], text4)
text1[text3]
text1[text3], text4

So far I managed to construct two expressions and when the first one fails it falls back on the second.

/\(?([^|[]*)\|?([^[]*)\[?(.*)\],?\s?([^)]*)\)?/

/([^|]*)\|?(.*)/

Perhaps there is better way to parse it.

Is it possible to capture everything above with one regex?

Thanks for help

Example

const items = [
"text1",
"text1|text2",
"text1|text2[text3]",
"text1|text2[text3] text4",
"(text1|text2[text3], text4)",
"text1[text3]",
"text1[text3], text4"
]

const parse = (text) => {

const [_, text1, text2, text3, text4] = /\(?([^|[]*)\|?([^[]*)\[?(.*)\],?\s?([^)]*)\)?/.exec(text)
|| /([^|]*)\|?(.*)/.exec(text)

  return {
    text1,
    text2,
    text3,
    text4
  };
}

for(const text of items) {
   console.log(parse(text));
}

CodePudding user response:

You can use

const items = [
"text1",
"text1|text2",
"text1|text2[text3]",
"text1|text2[text3] text4",
"(text1|text2[text3], text4)",
"text1[text3]",
"text1[text3], text4"
]

const parse = (text) => {

const [_, text1, text2, text3, text4] = /^\(?([^[|] )(?:\|([^[] ))?(?:\[([^\][]*)](?:\s*(?:,\s*)?([^\s)].*?))?)?\)?$/.exec(text)

  return {
    text1,
    text2,
    text3,
    text4
  };
}

for(const text of items) {
   console.log(text, parse(text));
}

See the regex demo. Details:

  • ^ - start of string
  • \(? - an optional )
  • ([^[|] ) - Group 1: one or more chars other than [ and |
  • (?:\|([^[] ))? - an optional sequence of | and then Group 2: any one or more chars other than [ as many as possible
  • (?:\[([^\][]*)](?:\s*(?:,\s*)?([^\s)].*?))?)? - an optional sequence of
    • \[([^\][]*)] - [, zero or more chars other than [ and ] (captured into Group 3) and then a ]
    • (?:\s*(?:,\s*)?([^\s)].*?))? - an optional sequence of
      • \s* - zero or more whitespace chars
      • (?:,\s*)? - an optional sequence of , and zero or more whitespace chars
      • ([^\s)].*?) - Group 4: a char other than whitespace and ) and then zero or more chars other than line break chars, as few as possible
  • \)? - an optional )
  • $ - end of string.
  • Related