If I create a message this way (using real addresses, of course):
msg = email.message.EmailMessage()
msg['From'] = "[email protected]"
msg['To'] = "[email protected]"
msg['Subject'] = "Ayons asperges pour le déjeuner"
msg.set_content("Cela ressemble à un excellent recipie déjeuner.")
I can successfully send it using smtplib
. No problem with the Unicode characters in the body. The received message has these headers:
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
If I try to create the same message in this alternative way:
msgsource = """\
From: [email protected]
To: [email protected]
Subject: Ayons asperges pour le déjeuner
Cela ressemble à un excellent recipie déjeuner.
"""
msg = email.parser.Parser(policy=email.policy.default).parsestr(msgsource)
I can't send it. send_message()
from smtplib
fails with
UnicodeEncodeError: 'ascii' codec can't encode character '\xe0' in position 15: ordinal not in range(128)
and obviously expects ascii, not Unicode. What causes the difference and how to fix it properly?
(code is based on these examples)
CodePudding user response:
The error can be avoided by encoding msgsource
and then parsing the resulting bytes:
msgsource = msgsource.encode('utf-8')
msg = email.message_from_bytes(msgsource, policy=policy.default)
print(msg)
outputs
From: [email protected]
To: [email protected]
Subject: Ayons asperges pour le =?unknown-8bit?q?d=C3=A9jeuner?=
Cela ressemble �� un excellent recipie d��jeuner.
sending it to Python's SMTP DebuggingServer produces
b'From: [email protected]'
b'To: [email protected]'
b'Subject: Ayons asperges pour le d\xc3\xa9jeuner'
b'X-Peer: ::1'
b''
b'Cela ressemble \xc3\xa0 un excellent recipie d\xc3\xa9jeuner.'
Note that no encoding headers are written: I'm guessing that the parsers attempt to reproduce the message from the source string or bytes as faithfully as possible, making as few additional assumptions as possible. The Parser docs
[Parser is] an API that can be used to parse a message when the complete contents of the message are available in a [string/bytes/file]
seem to me to support this interpretation.