I'm trying to match parentheses unless they're escaped in Python. The test case I'm using is this:
s = "aa}}bb"
co = re.compile(r"(^|[^e])(})")
print(s[0:], co.search(s))
print(s[1:], co.search(s, 1))
print(s[2:], co.search(s, 2))
print(s[3:], co.search(s, 3)) # This outputs None?!
The intent of the RegEx pattern is "either there is no character in front of the curly brace, or there is a character that isn't the escape (here e
)". The last substring I'm searching was s[3:] == }bb
, I thought. It doesn't match the pattern however, and although this is quite strange, I guess this is because
- the RegEx-created substring does know that there is no start-of-line before it, and
- it doesn't know that there was any character in front of it.
In other words: s[3:]
is not actually what's being searched. One way I see to circumvent this is to just co.search(s[3:])
, which will give me the start-of-line. I'd like to just use search
's argument instead of slicing, because I'm working with big strings and slicing copies memory. Can it be done?
CodePudding user response:
Yes, that's documented behaviour:
pos ... is not completely equivalent to slicing the string; the
^
pattern character matches at the real beginning of the string ... but not necessarily at the index where the search is to start.What you probably want here is "negative lookbehind" which is written
(?<!...)
; so with an escape ofe
that'd be(?<!e)
s = "aa}}bb" co = re.compile(r"(?<!e)(})") print(s[0:], co.search(s)) print(s[1:], co.search(s, 1)) print(s[2:], co.search(s, 2)) print(s[3:], co.search(s, 3))