Home > Blockchain >  Writing a regex expression that finds 'zz' in a word but not at the start and the end
Writing a regex expression that finds 'zz' in a word but not at the start and the end

Time:12-05

I am having some difficulty writing a regex expression that finds words in a text that contain zz, but not at the start and the end of the text. These are two of my many attempts:

pattern = re.compile(r'(?!(?:z){2})[a-z]*zz[a-z]*(?!(?:z){2})')
pattern = re.compile(r'\b[^z\s\d_]{2}[a-z]*zz[a-y][a-z]*(?!(?:zz))\b')

Thanks

CodePudding user response:

Well, the direct translation would be

\b(?!zz)(?:(?!zz\b)\w) zz(?:(?!zz\b)\w) \b

See a demo on regex101.com.


Programmatically, you could use

text = "lorem ipsum buzz mezzo mix zztop but this is all"

words = [word 
         for word in text.split()
         if not (word.startswith("zz") or word.endswith("zz")) and "zz" in word]

print(words)

Which yields

['mezzo']

See a demo on ideone.com.

CodePudding user response:

Another idea to use non word boundaries.

\B matches at any position between two word characters as well as at any position between two non-word characters ...

\w*\Bzz\B\w*

See this demo at regex101


Be aware that above matches words with two or more z. For exactly two:

\w*(?<=[^\Wz])zz(?=[^\Wz])\w*

Another demo at regex101


Use any of those patterns with (?i) flag for caseless matching if needed.

CodePudding user response:

You can use lookarounds:

\b(?!zz)\w ?zz\w \b(?<!zz)

demo

or not:

\bz?[^\Wz]\w*?zz\w*[^\Wz]z?\b

demo

Limited to ASCII letters this last pattern can also be written:

\bz?[a-y][a-z]*?zz[a-z]*[a-y]z?\b

CodePudding user response:

Your criteria just means that the first and last letter cannot be z. So we simply have to make sure the first and last letter is not z, and then we have a zz somewhere in the text.

Something like

^[^z].*zz.*[^z]$

should work

  • Related