Given a file where we find a variable numbers of 'group' and each group has a content.
- a line can contain a title between squares brackets
- and, bellow each groups titles, we have the content of the group. The content of the group hasn't restriction about char contained. A line in a content must not begin by the char '[' (a line that starts with '[' is a title of a group)
Bellow an example of a file following theses rules :
[titleOfgroup1]
foo
faafaa [ddfdf]
fii
[title of group2]
faa fii@@
<tag1>fuu</tag1>
foo1234
wdw
dwd
[title of [group3]]
faa faa
[titleOfGroup4]
fiifoo
I'm looking for capture with REGEX all titles groups and all contents foreach group captured. The result expected after REGEX work :
GROUP 1 :
MATCH 1 : 'titleOfgroup1'
MATCH 2 : 'foo
faafaa [ddfdf]
fii'
GROUP 2 :
MATCH 1 : 'title of group2'
MATCH 2 : 'faa fii@@
<tag1>fuu</tag1>
foo1234'
GROUP 3 :
MATCH 1 : 'title of [group3]'
MATCH 2 : 'faa faa'
GROUP 4 :
MATCH 1 : 'titleOfGroup4'
MATCH 2 : 'fiifoo'
I tried some REGEX and Im close to the solution but Im stuck. My last try is this REGEX : ^\[(.*)\]\n[\s]*([\S\s]*?)[\s]*(?=\n\[)
(
How can I get the last group ? thanks for any help !
(ps : im looking for a regex who works on javascript and php)
CodePudding user response:
You are not matching the last line, because the last part in your pattern (?=\n\[)
asserts that there must be a newline followed by [
present.
What you could do instead is capture the title in group 1, and then in group 2 match all lines that do not start with [
using a negative lookahead after matching a newline
^\[(. )\]\n((?:(?!\[).*(?:\n|$))*)
^
Start of string\[(. )\]
Match[
capture the title in group 1 and match]
\n
Match a newline (or \r?\n)(
Capture group 2(?:
Non capture group to repeat as a whole part(?!\[)
Negative lookahead, assert not[
to the right.*(?:\n|$)
Match the whole line and either a newline or assert the end of the string
)*
Close the non capture group and optionally repeat it to match all lines)
Close group 2
See a regex demo.