I have a multiline string with three of the following lines of the following form:
Text1 Text2a Text3
Text1 Text2b Text3
Text1 Text2! Text3
I wish to replace all texts between Text1
and Text3
with Text4
, unless the intermediate text contains the character !
. Thus, the desired output is:
Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text2! Text3
Let c
be the multiline string above. I believe re.sub
is the natural choice for this problem, so I tried the following:
c = re.sub("Text1(.*?)(?!=\!)Text3", "Text1 Text4 Text3", c, flags=re.DOTALL)
However, it replaces every intermediate text with Text4
. That is, I get the following output:
Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text4 Text3
How can I resolve this?
CodePudding user response:
I would phrase this as:
import re
c = """Text1 Text2a Text3
Text1 Text2b Text3
Text1 Text2! Text3"""
c = re.sub("^Text1(?: [^\s!] ) Text3$", "Text1 Text4 Text3", c, flags=re.M)
print(c)
This prints:
Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text2! Text3
Here is an explanation of the regex pattern used:
^
from the start of the line (re.M
is multiline mode)Text1
match "Text1"(?: [^\s!] )
then match one or more non whitespace terms NOT containing!
Text3
match space and "Text3"$
end of the line
CodePudding user response:
You don't really need a negative lookahead
to achieve your results. Matching anything except !
character would do just fine. Modifying your regex as follows fixes the issue:
c = re.sub("Text1([^\!]*?)Text3", "Text1 Text4 Text3", c, flags=re.DOTALL)
You can play with it online here and understand more about the regex here.
CodePudding user response:
Use the less greedy.*? pattern to match as little text as possible before attempting to match the next pattern to resolve this problem. You can also use a positive lookahead assertion, (?=! ), to determine whether the! character is present in the intermediate text, as in the following example:
import re
c = """Text1 Text2a Text3 Text1 Text2b Text3 Text1 Text2! Text3"""
c = re.sub(r"Text1(.?)(?=!)Text3", "Text1 Text2! Text3", c, flags=re.DOTALL) c = re.sub(r"Text1(.?)Text3", "Text1 Text4 Text3", c, flags=re.DOTALL)
print(c)