Python 3.6 email module crashes with this error:
Traceback (most recent call last):
File "empty-eml.py", line 9, in <module>
for part in msg.iter_attachments():
File "/usr/lib/python3.6/email/message.py", line 1055, in iter_attachments
parts = self.get_payload().copy()
AttributeError: 'str' object has no attribute 'copy'
The crash can be reproduced with this EML file,
From: "[email protected]" <[email protected]>
To: <[email protected]>
Subject: COURRIER EMIS PAR PACIFICA
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_3181_1274694650.1556805728023"
Date: Thu, 2 May 2019 16:02:08 0200
and this piece of minimal code:
from email import policy
from email.parser import Parser
from sys import argv
with open(argv[1]) as eml_file:
msg = Parser(policy=policy.default).parse(eml_file)
for part in msg.iter_attachments():
pass
I believe it has to do something with the Content-Type being multipart/mixed
together with the email content being empty, which causes get_payload
to return str
. However, I am not sure, if such EML is forbidden by standard (but I have many such samples), it is a bug in the email module, or me using the code wrong.
CodePudding user response:
If you change the policy to strict
:
Parser(policy=policy.strict).parse(eml_file)
the parser raises email.errors.StartBoundaryNotFoundDefect
, described in the docs as:
StartBoundaryNotFoundDefect
– The start boundary claimed in the Content-Type header was never found.
If you parse the message with policy.default
and inspect it's defects
afterwards it contains two defects:
[StartBoundaryNotFoundDefect(), MultipartInvariantViolationDefect()]
MultipartInvariantViolationDefect
– A message claimed to be a multipart, but no subparts were found. Note that when a message has this defect, its is_multipart() method may return false even though its content type claims to be multipart.
A consequence of the StartBoundaryNotFoundDefect
is that the parser terminates parsing and sets the message payload to the body that has been captured so far - in this case, nothing, so the payload is an empty string, causing the exception that you are seeing when you run your code.
Arguably the fact that Python doesn't check whether payload is a list
before calling copy()
on it is a bug.
In practice, you have to handle these messages either by wrapping the iteration of attachments in a try/except
, conditioning iteration on the contents of msg.defects
, or parsing with policy.strict
and discarding all messages that report defects.