Home > Enterprise >  Python regex, multiline search with start of line
Python regex, multiline search with start of line

Time:04-03

I have a block of text as shown below.

import re

one = """
ASDFABC
ABC
ABC
ABC
ASDF
ASDF
ASDF
ASDFABC"""

two = """\
ABC
ABC
ABC

ASDF
ASDF
ASDF
ASDFABC"""

I am searching for a way replace the whole block starting from ABC replaced by one single "TEST". for example, variable one should result in,

"""
ASDFABC
TEST
ASDF
ASDF
ASDF
ASDFABC"""

As a side note, ABC not starting from the beginning of the line could be anywhere as shown in first and last line in "one" and those should be ignored. Also as shown in "two" ABC is not necessarily followed by "\n"

How could this be done ?

Attempts made so far.

>>> re.findall(r"(?:\nABC.*) ", one)
['\nABC\nABC\nABC']

>>> re.findall(r"(?:\nABC.*) ", two)
['\nABC\nABC']

>>> re.findall(r"(?:\nABC.*) ", two, re.M)
['\nABC\nABC']

>>> re.findall(r"(?:\nABC.*) ", two, re.MULTILINE|re.DOTALL)
['\nABC\nABC\n\nASDF\nASDF\nASDF\nASDFABC']

>>> re.findall(r"(?:^ABC.*) ", two, re.MULTILINE|re.DOTALL)
['ABC\nABC\nABC\n\nASDF\nASDF\nASDF\nASDFABC']

>>> re.findall(r"(?:^ABC.*) ", two, re.MULTILINE)
['ABC', 'ABC', 'ABC']

>>> re.findall(r"(?:\n*ABC.*) ", two, re.MULTILINE)
['ABC\nABC\nABC', 'ABC']

CodePudding user response:

You can use

re.sub(r'^ABC(?:\nABC)*$', 'TEST', text, flags=re.M)

See the regex demo. Details:

  • ^ - start of a line (due to re.M)
  • ABC - a fixed string
  • (?:\nABC)* - zero or more repetitions of an LF char and ABC string
  • $ - end of a line.

Note that flags=re.M needs to be used with flags= since the next positional attribute in re.sub after the input string is a count attribute.

  • Related