Python re.sub always returns the original string value and ignores given pattern-CodePudding

My code below

old = """
B07K6VMVL5
B071XQ6H38
B0B7F6Q9BH
B082KTHRBT
B0B78CWZ91
B09T8TJ65B
B09K55Z433
"""
duplicate = """
B0B78CWZ91
B09T8TJ65B
B09K55Z433
"""
final = re.sub(r"\b{}\b".format(duplicate),"",old)
print(final)

The final always prints the old variable values.I want the duplicate values to be removed in the old variable

CodePudding user response：

The block string should not start/end in a new line since it will introduce a \n character. Try with

old = """B07K6VMVL5
B071XQ6H38
B0B7F6Q9BH
B082KTHRBT
B0B78CWZ91 #    <-
B09T8TJ65B #    <-
B09K55Z433""" # <-

duplicate = """B0B78CWZ91
B09T8TJ65B
B09K55Z433"""

and the result will not equal to the old.

Output

B07K6VMVL5
B071XQ6H38
B0B7F6Q9BH
B082KTHRBT

Alternatively use the block string like this

"""\
B0B78CWZ91
B09T8TJ65B
B09K55Z433\
"""

CodePudding user response：

It seems you can use

final = re.sub(r"(?!\B\w){}(?<!\w\B)".format(re.escape(duplicate.strip())),"",old)

Note several things here:

duplicate.strip() - the whitespaces on both ends may prevent from matching, so strip() removes them from the duplicates
re.escape(...) - if there are special chars they are properly escaped with re.escape
(?!\B\w) and (?<!\w\B) are dynamic adaptive word boundaries. They provide proper matching at word boundaries if required.