I would like to extract some data from the text which has following patterns.
text1
text1|text2
text1|text2[text3]
text1|text2[text3] text4
(text1|text2[text3], text4)
text1[text3]
text1[text3], text4
So far I managed to construct two expressions and when the first one fails it falls back on the second.
/\(?([^|[]*)\|?([^[]*)\[?(.*)\],?\s?([^)]*)\)?/
/([^|]*)\|?(.*)/
Perhaps there is better way to parse it.
Is it possible to capture everything above with one regex?
Thanks for help
Example
const items = [
"text1",
"text1|text2",
"text1|text2[text3]",
"text1|text2[text3] text4",
"(text1|text2[text3], text4)",
"text1[text3]",
"text1[text3], text4"
]
const parse = (text) => {
const [_, text1, text2, text3, text4] = /\(?([^|[]*)\|?([^[]*)\[?(.*)\],?\s?([^)]*)\)?/.exec(text)
|| /([^|]*)\|?(.*)/.exec(text)
return {
text1,
text2,
text3,
text4
};
}
for(const text of items) {
console.log(parse(text));
}
CodePudding user response:
You can use
const items = [
"text1",
"text1|text2",
"text1|text2[text3]",
"text1|text2[text3] text4",
"(text1|text2[text3], text4)",
"text1[text3]",
"text1[text3], text4"
]
const parse = (text) => {
const [_, text1, text2, text3, text4] = /^\(?([^[|] )(?:\|([^[] ))?(?:\[([^\][]*)](?:\s*(?:,\s*)?([^\s)].*?))?)?\)?$/.exec(text)
return {
text1,
text2,
text3,
text4
};
}
for(const text of items) {
console.log(text, parse(text));
}
See the regex demo. Details:
^
- start of string\(?
- an optional)
([^[|] )
- Group 1: one or more chars other than[
and|
(?:\|([^[] ))?
- an optional sequence of|
and then Group 2: any one or more chars other than[
as many as possible(?:\[([^\][]*)](?:\s*(?:,\s*)?([^\s)].*?))?)?
- an optional sequence of\[([^\][]*)]
-[
, zero or more chars other than[
and]
(captured into Group 3) and then a]
(?:\s*(?:,\s*)?([^\s)].*?))?
- an optional sequence of\s*
- zero or more whitespace chars(?:,\s*)?
- an optional sequence of,
and zero or more whitespace chars([^\s)].*?)
- Group 4: a char other than whitespace and)
and then zero or more chars other than line break chars, as few as possible
\)?
- an optional)
$
- end of string.