I'd like to extract everything that follows a "line break and integer" until the next "line break and integer", where i'd like to capture everything that follows that and so on. For example for the following string:
"\na \n1 b\nc \n2 b\nc \n3 b\nc"
I'd like to capture the following groups:
["\n1 b\nc ", "\n2 b\nc ", "\n3 b\nc"]
This is what i've tried
re.findall("\n\d[\s\S]*(?=\n\d)*","\na \n1 b\nc \n2 b\nc \n3 b\nc")
But it's not splitting the matches, I think i need to make it "non-greedy" but i'm not sure how.
['\n1 b\nc \n2 b\nc \n3 b\nc']
CodePudding user response:
You may use this regex in DOTALL or single line mode:
(?s)\n\d.*?(?=\n\d|\Z)
RegEx Details:
(?s)
: Enable single line mode to allow dot to match line break\n
: Match a line break\d
: Match a digit.*?
: Match 0 or more of any characters (lazy)(?=\n\d|\Z)
: Lookahead to assert that we have either another line break and digit or end of input
Code:
>>> import re
>>> s = "\na \n1 b\nc \n2 b\nc \n3 b\nc"
>>> re.findall(r'(?s)\n\d.*?(?=\n\d|\Z)', s)
['\n1 b\nc ', '\n2 b\nc ', '\n3 b\nc']