I am having some difficulty writing a regex expression that finds words in a text that contain zz
, but not at the start and the end of the text. These are two of my many attempts:
pattern = re.compile(r'(?!(?:z){2})[a-z]*zz[a-z]*(?!(?:z){2})')
pattern = re.compile(r'\b[^z\s\d_]{2}[a-z]*zz[a-y][a-z]*(?!(?:zz))\b')
Thanks
CodePudding user response:
Well, the direct translation would be
\b(?!zz)(?:(?!zz\b)\w) zz(?:(?!zz\b)\w) \b
Programmatically, you could use
text = "lorem ipsum buzz mezzo mix zztop but this is all"
words = [word
for word in text.split()
if not (word.startswith("zz") or word.endswith("zz")) and "zz" in word]
print(words)
Which yields
['mezzo']
See a demo on ideone.com.
CodePudding user response:
Another idea to use non word boundaries.
\B
matches at any position between two word characters as well as at any position between two non-word characters ...
\w*\Bzz\B\w*
Be aware that above matches words with two or more z
. For exactly two:
\w*(?<=[^\Wz])zz(?=[^\Wz])\w*
Use any of those patterns with (?i)
flag for caseless matching if needed.
CodePudding user response:
You can use lookarounds:
\b(?!zz)\w ?zz\w \b(?<!zz)
or not:
\bz?[^\Wz]\w*?zz\w*[^\Wz]z?\b
Limited to ASCII letters this last pattern can also be written:
\bz?[a-y][a-z]*?zz[a-z]*[a-y]z?\b
CodePudding user response:
Your criteria just means that the first and last letter cannot be z
. So we simply have to make sure the first and last letter is not z
, and then we have a zz
somewhere in the text.
Something like
^[^z].*zz.*[^z]$
should work