I have a working regular expression that is used to pull 'cid' image references out of the body of an email. I have tested this regex successfully with .NET and regex101.com. It is successful in both instances. When I try in Python, I get the expected number of matches, but they are all empty strings. Code is below.
x = re.findall(r"\*?cid\:(.*?)[a-zA-Z0-9\-.@] .*?", msg.body)
for s in x:
print(len(s))
Output:
The top five lines are the expected matches. The bottom lines are the output from the code above.
Here it is working on regex101.com
What am I missing?
CodePudding user response:
findall() is capturing you inner regex group in parenthesis. To avoid this, include ?: in you regex as:
x = re.findall(r"\*?cid\:(?:.*?)[a-zA-Z0-9\-.@] .*?", x, flags=0)