Home > OS >  Python Negative Lookbehind with a variable number of characters
Python Negative Lookbehind with a variable number of characters

Time:10-06

I know there are a lot of regex and negative lookbehind questions but I have one that I cannot find an answer to. I want to find instances of water but not if it has never in front of it with a variable number of characters between the two. There is an infinite number of variable characters between these two words and lookbehind does not allow for variable characters. I have code that will find never but it will find never at the very start of the script. Is there a way to limit a lookbehind to only 20 or 30 characters? What I have:

(?i)^(?=.*?(?:water))(?:(?!never).)*$

Just some of the examples I am working with:

water                                                         (match)
I have water                                                  (match)
I never have water
Where is the water.                                           (match)
I never have food or water
I never have food but I always have water                     (match)
I never have food or chips. I like to walk. I have water      (match)

Again, the problem is that I could have a paragraph that is 10 sentences long and if it has never any where in there it will not find water and that lookbehind and lookahead does not accept variable characters. I appreciate any help you could give.

CodePudding user response:

You can use this regex in Python's builtin re module:

(?i)^(?!.*\bnever\b.{,20}\bwater\b).*\bwater\b

RegEx Demo

RegEx Details:

  • (?i): Enable ignore case mode
  • ^: Start
  • (?!.*\bnever\b.{,20}\bwater\b): Negative lookahead condition. This will fail the match if word never appears within 20 characters before word water.
  • .*\bwater\b: Find word water anywhere in the line

CodePudding user response:

Negative lookbehind with variable number of characters is not supported in Python. What you can do is check if "never is before water", and return False in that case. For eg:

def test(string):
    if re.match('.*never.*water.*', string):
        return False
    elif re.match('.*water.*', string):
        return True
    else:
        # return False?
        return False
  • Related