Home > OS >  Replace string between two strings unless it contains a substring
Replace string between two strings unless it contains a substring

Time:12-25

I have a multiline string with three of the following lines of the following form:

Text1 Text2a Text3
Text1 Text2b Text3
Text1 Text2! Text3

I wish to replace all texts between Text1 and Text3 with Text4, unless the intermediate text contains the character !. Thus, the desired output is:

Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text2! Text3

Let c be the multiline string above. I believe re.sub is the natural choice for this problem, so I tried the following:

c = re.sub("Text1(.*?)(?!=\!)Text3", "Text1 Text4 Text3", c, flags=re.DOTALL)

However, it replaces every intermediate text with Text4. That is, I get the following output:

Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text4 Text3

How can I resolve this?

CodePudding user response:

I would phrase this as:

import re

c = """Text1 Text2a Text3
Text1 Text2b Text3
Text1 Text2! Text3"""

c = re.sub("^Text1(?: [^\s!] )  Text3$", "Text1 Text4 Text3", c, flags=re.M)
print(c)

This prints:

Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text2! Text3

Here is an explanation of the regex pattern used:

  • ^ from the start of the line (re.M is multiline mode)
  • Text1 match "Text1"
  • (?: [^\s!] ) then match one or more non whitespace terms NOT containing !
  • Text3 match space and "Text3"
  • $ end of the line

CodePudding user response:

You don't really need a negative lookahead to achieve your results. Matching anything except ! character would do just fine. Modifying your regex as follows fixes the issue:

c = re.sub("Text1([^\!]*?)Text3", "Text1 Text4 Text3", c, flags=re.DOTALL)

You can play with it online here and understand more about the regex here.

CodePudding user response:

Use the less greedy.*? pattern to match as little text as possible before attempting to match the next pattern to resolve this problem. You can also use a positive lookahead assertion, (?=! ), to determine whether the! character is present in the intermediate text, as in the following example:

import re

c = """Text1 Text2a Text3 Text1 Text2b Text3 Text1 Text2! Text3"""

c = re.sub(r"Text1(.?)(?=!)Text3", "Text1 Text2! Text3", c, flags=re.DOTALL) c = re.sub(r"Text1(.?)Text3", "Text1 Text4 Text3", c, flags=re.DOTALL)

print(c)

  • Related