Home > Blockchain >  Regex to match everything after a pattern occurrence until the next pattern occurs and so on
Regex to match everything after a pattern occurrence until the next pattern occurs and so on

Time:02-16

I'd like to extract everything that follows a "line break and integer" until the next "line break and integer", where i'd like to capture everything that follows that and so on. For example for the following string:

"\na \n1 b\nc \n2 b\nc \n3 b\nc"

I'd like to capture the following groups:

["\n1 b\nc ", "\n2 b\nc ", "\n3 b\nc"]

This is what i've tried

re.findall("\n\d[\s\S]*(?=\n\d)*","\na \n1 b\nc \n2 b\nc \n3 b\nc")

But it's not splitting the matches, I think i need to make it "non-greedy" but i'm not sure how.

['\n1 b\nc \n2 b\nc \n3 b\nc']

CodePudding user response:

You may use this regex in DOTALL or single line mode:

(?s)\n\d.*?(?=\n\d|\Z)

RegEx Demo

RegEx Details:

  • (?s): Enable single line mode to allow dot to match line break
  • \n: Match a line break
  • \d: Match a digit
  • .*?: Match 0 or more of any characters (lazy)
  • (?=\n\d|\Z): Lookahead to assert that we have either another line break and digit or end of input

Code:

>>> import re
>>> s = "\na \n1 b\nc \n2 b\nc \n3 b\nc"
>>> re.findall(r'(?s)\n\d.*?(?=\n\d|\Z)', s)
['\n1 b\nc ', '\n2 b\nc ', '\n3 b\nc']
  • Related