Home > Net >  regular expression : capture groups titles (between square bracket) dans the content of each groups
regular expression : capture groups titles (between square bracket) dans the content of each groups

Time:11-04

Given a file where we find a variable numbers of 'group' and each group has a content.

  • a line can contain a title between squares brackets
  • and, bellow each groups titles, we have the content of the group. The content of the group hasn't restriction about char contained. A line in a content must not begin by the char '[' (a line that starts with '[' is a title of a group)

Bellow an example of a file following theses rules :

[titleOfgroup1]
foo
faafaa [ddfdf]

fii
[title of group2]

faa fii@@
<tag1>fuu</tag1>

    foo1234

wdw

dwd

[title of [group3]]
faa faa
[titleOfGroup4]

fiifoo

I'm looking for capture with REGEX all titles groups and all contents foreach group captured. The result expected after REGEX work :

GROUP 1 : 
    MATCH 1 : 'titleOfgroup1'
    MATCH 2 : 'foo
faafaa [ddfdf]

fii'

GROUP 2 : 
    MATCH 1 : 'title of group2'
    MATCH 2 : 'faa fii@@
<tag1>fuu</tag1>

    foo1234'

GROUP 3 : 
    MATCH 1 : 'title of [group3]'
    MATCH 2 : 'faa faa'

GROUP 4 : 
    MATCH 1 : 'titleOfGroup4'
    MATCH 2 : 'fiifoo'

I tried some REGEX and Im close to the solution but Im stuck. My last try is this REGEX : ^\[(.*)\]\n[\s]*([\S\s]*?)[\s]*(?=\n\[) (enter image description here

How can I get the last group ? thanks for any help !

(ps : im looking for a regex who works on javascript and php)

CodePudding user response:

You are not matching the last line, because the last part in your pattern (?=\n\[) asserts that there must be a newline followed by [ present.

What you could do instead is capture the title in group 1, and then in group 2 match all lines that do not start with [ using a negative lookahead after matching a newline

^\[(. )\]\n((?:(?!\[).*(?:\n|$))*)
  • ^ Start of string
  • \[(. )\] Match [ capture the title in group 1 and match ]
  • \n Match a newline (or \r?\n)
  • ( Capture group 2
    • (?: Non capture group to repeat as a whole part
      • (?!\[) Negative lookahead, assert not [ to the right
      • .*(?:\n|$) Match the whole line and either a newline or assert the end of the string
  • )* Close the non capture group and optionally repeat it to match all lines
  • ) Close group 2

See a regex demo.

  • Related