Home > Enterprise >  capture group negative lookahead ignored by regex
capture group negative lookahead ignored by regex

Time:11-29

In a long text file there are several headers.
I need to split this file so that I have each header and content separately.
Repeated headers are to be considered as one. Minimum example:

HeaderA
example text

HeaderB
example text

HeaderC
example text

HeaderC
example text

HeaderD
example text

Using this regular expression in python I have managed that:

Header(\w)[\s\S]*?(?=Header(?!\1)|$)

note that both HeaderC are captured as one group.

enter image description here

Here is my regex Header 3 match:

enter image description here

  • Related