Home > Net >  Regex - Capture multiple multiline text blocks with only a starting pattern
Regex - Capture multiple multiline text blocks with only a starting pattern

Time:09-26

I have a very large text file with several entries like this:

    -------------------------------------
    
       LOTS OF
        MULTILINE
       TEXT
    
    *************************************
              MORE
       MULTILINE
         TEXT
    
    *************************************
    
       EVEN MORE
    
    *************************************

    -------------------------------------

       2ND LOT OF
        MULTILINE
       TEXT
    
    *************************************
      MORE
       MULTILINE
         TEXT FOR 2ND LOT
    
    *************************************
    
       EVEN MORE TEXT FOR 2ND

    *************************************

Note that these are only two entries, I don't care about the asterisks, but the text that follows the dashed line.

I want to get a capture group with all the text in each entry so that I can analyze it later line by line.

I can capture the first entry with an expression like this:

/-{37}\s*([\s\S] )-{37}/gm

But I'm having trouble running the capture group several times because I don't have a clear terminator for the groups (since the *{37} appears several times)

Here's a regex 101 example:

https://regex101.com/r/XZQ5h6/1

How can I capture the text after the dashed line but before the next dashed line or the end of the file?

CodePudding user response:

You can use this regex:

-{37}\R ((?:. \R) )

RegEx Demo

RegEx Detail;

  • -{37}: Match hyphen of 37 in length
  • \R : Match 1 of line breaks
  • (: Start capture group
    • (?:. \R) : Match a line of 1 character followed by a line break. Repeat this group 1 times to match multiple of these lines
  • ): End capture group

CodePudding user response:

This regex will match both entries:

/-{37}[^-] /gm

Try it out in regex101.

  • Related