Home > Software engineering >  Problem with a mail message created by a parser
Problem with a mail message created by a parser

Time:10-01

If I create a message this way (using real addresses, of course):

msg = email.message.EmailMessage()
msg['From'] = "[email protected]"  
msg['To'] = "[email protected]" 
msg['Subject'] = "Ayons asperges pour le déjeuner"
msg.set_content("Cela ressemble à un excellent recipie déjeuner.")

I can successfully send it using smtplib. No problem with the Unicode characters in the body. The received message has these headers:

Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

If I try to create the same message in this alternative way:

msgsource = """\
From: [email protected]
To: [email protected]
Subject: Ayons asperges pour le déjeuner

Cela ressemble à un excellent recipie déjeuner.
"""

msg = email.parser.Parser(policy=email.policy.default).parsestr(msgsource)

I can't send it. send_message() from smtplib fails with

UnicodeEncodeError: 'ascii' codec can't encode character '\xe0' in position 15: ordinal not in range(128)

and obviously expects ascii, not Unicode. What causes the difference and how to fix it properly?

(code is based on these examples)

CodePudding user response:

The error can be avoided by encoding msgsource and then parsing the resulting bytes:

msgsource = msgsource.encode('utf-8')
msg = email.message_from_bytes(msgsource, policy=policy.default)
print(msg)

outputs

From: [email protected]
To: [email protected]
Subject: Ayons asperges pour le =?unknown-8bit?q?d=C3=A9jeuner?=

Cela ressemble �� un excellent recipie d��jeuner.

sending it to Python's SMTP DebuggingServer produces

b'From: [email protected]'
b'To: [email protected]'
b'Subject: Ayons asperges pour le d\xc3\xa9jeuner'
b'X-Peer: ::1'
b''
b'Cela ressemble \xc3\xa0 un excellent recipie d\xc3\xa9jeuner.'

Note that no encoding headers are written: I'm guessing that the parsers attempt to reproduce the message from the source string or bytes as faithfully as possible, making as few additional assumptions as possible. The Parser docs

[Parser is] an API that can be used to parse a message when the complete contents of the message are available in a [string/bytes/file]

seem to me to support this interpretation.

  • Related