This seems like a simple match, but I'm unable to figure out how to match all text that starts with a known block of text and ends with a semicolon newline. What I have right now mostly works:
pattern = r'''[ ] (value \w \n)([^;] )'''
For an example section of text that allows me to parse:
value Y1N5NALC
1 = 'Yes'
5 = 'No'
7 = 'Not ascertained' ;
value AGESCRN
15 = '15 years'
16 = '16 years';
However, if any of the key/value pairs contain a semicolon in the string the match fails early since the regex is looking for any semicolon. An example:
value Y1N5NALC
1 = 'Yes'
5 = 'No;Maybe'
7 = 'Not ascertained' ;
What I'd like to do is end the match by looking for a semicolon
Optional(space or tab)
newline
. Using ([^;\n] )
fails since the newline gets match to the negative.
CodePudding user response:
You can use
(?sm)^ (value \w \n)(.*?);$
See the regex demo.
Details:
(?sm)
-re.S
andre.M
are on^
- start of a line(value \w \r?\n)
- Group 1:value
, space, one or more word chars, and and an LF line break(.*?)
- Group 2:;
- a;
$
- at the end of a line.
In case there can be CRLF endings, you need
(?sm)^ (value \w \r?\n)(.*?);\r?$