Home > Software design >  how write python re.sub pattern Ignore single or double quotes?
how write python re.sub pattern Ignore single or double quotes?

Time:05-11

mystr = """{abc} [abc] (abc) ['abc'] ["abc"]"""
pattern = r'\babc\b'
mystr = re.sub(pattern, "nnn", mystr)
print(mystr)
# {nnn} [nnn] (nnn) ['nnn'] ["nnn"]

but, i hope return {nnn} [nnn] (nnn) ['abc'] ["abc"]

how ignore single or double quotes?

CodePudding user response:

You may use a regex pattern which targets only exactly abc inside either curly braces, square brackets, or parentheses:

mystr = """{abc} [abc] (abc) ['abc'] ["abc"]"""
output = re.sub(r'([{(\[])abc([})\]])', r'\1nnn\2', mystr)
print(output)  # {nnn} [nnn] (nnn) ['abc'] ["abc"]

For a more general solution which would target only elements not quoted, then use re.sub with a callback function:

mystr = """{abc} [abc] (abc) ['abc'] ["abc"]"""
output = re.sub(r'([{(\[])(.*?)([})\]])', lambda m: m.group(1)   'nnn'   m.group(3) if not re.search(r"^['\"].*['\"]$", m.group(2)) else m.group(), mystr)
print(output)  # {nnn} [nnn] (nnn) ['abc'] ["abc"]

CodePudding user response:

If you want want a simple solution that will replace abc bounded by any non-word characters other than quotes, you can change your pattern to be this:

pattern = r'[^\w\'\"]abc[^w\'\"]'

CodePudding user response:

You can ignore the single or double quotes by combining the word boundaries with lookarounds to assert not a single or double quote to the left (?<![\'"]) and not a single or double quote right (?![\'"])

Example

mystr = """{abc} [abc] (abc) ['abc'] ["abc"]"""
output = re.sub(r'\b(?<![\'"])abc\b(?![\'"])', r'nnn', mystr)
print(output)

Output

{nnn} [nnn] (nnn) ['abc'] ["abc"]

If you want to pair up the same opening and closing brackets with the same quotes, you can use a pattern with an alternation, capture group and backreference to first match what you don't want to replace.

The last alternative has capture group 4, containing what you want to eventually want to replace, and you can check for group 4 in the callback of re.sub.

import re

pattern = r"{([\"'])[^{}]*\1}|\[([\"'])[^][]*\2]|\(([\"'])[^()]*\3\)|((?<={)[^{}]*(?=})|(?<=\()[^()]*(?=\))|(?<=\[)[^][]*(?=]))"

s = ("{abc} [abc] (abc) ['abc'] [\"abc\"]\n"
            "{\"abc\"} ('abc')(\"abc\")\n"
            "{abc](\"abc\"}{'abc\"}")

result = re.sub(pattern, lambda m: 'nnn' if m.group(4) else m.group(), s)
print(result)

Output

{nnn} [nnn] (nnn) ['abc'] ["abc"]
{"abc"} ('abc')("abc")
{nnn}{nnn}

See a regex101 demo for all the matches and a Python demo.

  • Related