Home > other >  Matching the start of a substring in Python RegEx
Matching the start of a substring in Python RegEx

Time:09-04

I'm trying to match parentheses unless they're escaped in Python. The test case I'm using is this:

s  = "aa}}bb"
co = re.compile(r"(^|[^e])(})")
print(s[0:], co.search(s))
print(s[1:], co.search(s, 1))
print(s[2:], co.search(s, 2))
print(s[3:], co.search(s, 3))  # This outputs None?!

The intent of the RegEx pattern is "either there is no character in front of the curly brace, or there is a character that isn't the escape (here e)". The last substring I'm searching was s[3:] == }bb, I thought. It doesn't match the pattern however, and although this is quite strange, I guess this is because

  1. the RegEx-created substring does know that there is no start-of-line before it, and
  2. it doesn't know that there was any character in front of it.

In other words: s[3:] is not actually what's being searched. One way I see to circumvent this is to just co.search(s[3:]), which will give me the start-of-line. I'd like to just use search's argument instead of slicing, because I'm working with big strings and slicing copies memory. Can it be done?

CodePudding user response:

  • Yes, that's documented behaviour:

    pos ... is not completely equivalent to slicing the string; the ^ pattern character matches at the real beginning of the string ... but not necessarily at the index where the search is to start.

    https://docs.python.org/3/library/re.html#re.Pattern.search

  • What you probably want here is "negative lookbehind" which is written (?<!...); so with an escape of e that'd be (?<!e)

    s  = "aa}}bb"
    co = re.compile(r"(?<!e)(})")
    print(s[0:], co.search(s))
    print(s[1:], co.search(s, 1))
    print(s[2:], co.search(s, 2))
    print(s[3:], co.search(s, 3))
    
  • Related