What I want

I'm trying to work out a way in which I can use regex to find two groups in RST news files. I want get change level as well as the change text, for instance a following .rst file:

hence I want a following regex (changelevel): (change text)
I was thinking about something like (changelevel): (anything until no next change level)

* Major: This is a **Major** change
* Minnor: This is is a minor change with a typo
* Patch: This
is a multiline
  patch

Should return a match, group1 and group2 as following

Match 1:

"* Major: This is a **Major** change"
"* Major: "
"This is a major **Major** change"

Match 2:

"* Patch: This\nis a multiline\n  patch"
"* Patch: "
"This\nis a multiline\n  patch

What I need help with

I cannot make a regex that will take care of multilines and asterisks present in the "change text" I tried following logic

Match the change level ^(\*\s (\w ):\s)
Match anything - with "dot matches newline" option turned on" .*
Negative forward lookup until I match the change level (?!^(\*\s (\w ):\s))

I ended up with ^(\*\s (\w ):\s).*(?!^(\*\s (\w ):\s)) but .* seems to just match everything to group 2

What works

I managed to get the first group working with a following regex which works works:

beginning of the line
star in front
then whitespace
a word
colon
white space

^(\*\s (\w ):\s)

CodePudding user response：

You are almost there, you can write the pattern using the lookahead and introduce matching a newline and if the assertions succeeds, then match the whole line.

^(\*\s \w :\s)(.*(?:\n(?!\*\s \w :\s).*)*)

Explanation

^ Start of string
( Capture group 1
- \*\s \w :\s match *, 1 whitespace chars, 1 word chars, : and a whitespace char
) Close group 1
( Capture group 2
- .* Match the whole line
- (?: Non capture group to repeat as a whole
- \n Match a newline
  - (?!\*\s \w :\s) The negative lookahead, asserting not the starting pattern here
  - .* Match the whole line
- )* Close the non capture group and optionally repeat it to match alles lines
) Close group 2

See a regex demo and a Python demo.

Example code:

import re
 
pattern = r"^(\*\s \w :\s)(.*(?:\n(?!\*\s \w :\s).*)*)"
 
s = ("* Major: This is a **Major** change\n"
    "* Minnor: This is is a minor change with a typo\n"
    "* Patch: This\n"
    "is a multiline\n"
    "  patch")
 
result = re.findall(pattern, s, re.MULTILINE)
print(result)

Output

[('* Major: ', 'This is a **Major** change'), ('* Minnor: ', 'This is is a minor change with a typo'), ('* Patch: ', 'This\nis a multiline\n  patch')]

CodePudding user response：

re.findall(r'(\*\s*\w :\s )([\s\S]*?(?=\n\*|$))',text)

Use \newline followed by * or end of string $ as a anchor
Group 1: A literal * followed by zero or more \spaces and any \word character, a literal : and one or more \spaces
Group 2: Match everything non greedily *? upto \n\* or $