Home > Software engineering >  Regex: How to multiline-capture lines together that start with an Asterisk
Regex: How to multiline-capture lines together that start with an Asterisk

Time:02-03

What I am trying to do

I have a string that looks like this:

foobar

* Level1-1
* Level1-2
** Level2-1
** Level2-2
*** Level3-1
*** Level3-2

foo
foo
foo
bar

* Level1-1
foo

bar

foo
bar

* Level1-1
** Level2-1

foo
bar

I would like to use Regex to capture the lines starting with an Asterisk together, so given the string above I get the following three results captured together:

Result 1

* Level1-1
* Level1-2
** Level2-1
** Level2-2
*** Level3-1
*** Level3-2

Result 2

* Level1-1

Result 3

* Level1-1
** Level2-1

What I tried

I tried to use this regex with a multiline flag (/m):

^(?<Content>\*(.|\n|\r|\n\r) )(?=[\n\r] [^\*] )

The regex as is understand/intended it:

^ = Line/String Start

(?<Content>\*(.|\n|\r|\n\r) ) = The Capture Group that multiline-matches all lines that start with an Asterisk

(?=[\n\r] [^\*] ) = Positive Lookahead to match any line that does not start with/contain an Asterisk, thus ending the match.

I expected this regex to match what I need, but it actually matches the whole string apart from the first 2 lines and the last line of my string.

I know that I could easily match the single lines with the following regex (^\*.*), but I need the subsequent lines containing Asterisks to go into a single group instead of one group for each line.

The Question

I am not sure what I am doing wrong, especially with the positive lookahead part, and I would be very grateful for any advice on how I can achieve my goal.

CodePudding user response:

I initially started trying to use multiline mode, but then gave up and fell back to a plain regex with no special modes:

(?<=^|\n)\* .*(?:\n\* .*)*

Demo

Explanation:

  • (?<=^|\n) assert that match begins with start of the line
  • \* match one or more stars
  • .* match rest of the line
  • (?:
    • \n match a newline
    • \* match one or more stars
    • .* match the rest of the line
  • )* zero or more times
  • Related