setting outgoing email smtp gmail References in header in Python3-CodePudding

I'm having difficulty setting the References: field in the header of an outgoing SMTP email for Gmail. I'm using Python 3.8 with smtplib and email.message libraries. The code is:

reference_ids = [
    '<BN8PR17MB27372595A957D7912CEE184FBF6F9@BN8PR17MB2737.namprd17.prod.outlook.com>',
    '<CAM9Ku=FZ5RGMvw3VzNrZz [email protected]>',
    '<BN8PR17MB27371C71A65834531DF028BBBF6F9@BN8PR17MB2737.namprd17.prod.outlook.com>',
    '<CAM9Ku=E1wmpj=AMRhsh-Sk1RHqmK_x-J5ey8szVehefYQvn13w@mail.gmail.com>']
in_reply_to = reference_ids[0]

smtp = smtplib.SMTP_SSL(es.smtp_server)
smtp.login(es.username, es.password)
msg = email.message.EmailMessage()
if (reference_ids is not None):
    msg.add_header('In-Reply-To', in_reply_to)
    msg.add_header('References', (' ').join(reference_ids))
msg['Subject'] = request.vars.subject
msg['From'] = es.email
msg['To'] = request.vars.to
msg['CC'] = request.vars.cc
msg['BCC'] = request.vars.bcc
msg.set_content(request.vars.message)
smtp.send_message(msg)
smtp.quit()

where reference_ids is a list of Message-IDs previous "in_reply_to"s of "message_id"s back to the originating email.

I can send the email without errors and when I view the "Show original" the References look ok. it is the proper list of Message-IDs of the form "[email protected]", without quotes, separated by a space.

However, when I try to later read the sent email with the imaplib library and email.message_from_bytes(raw_email_response_body[1]) I get a real mess of characters. Most of the addresses in the References loose their [email protected]" form. Message-ID and In-Reply-To looks ok though.

References: =?utf-8?q?=22=3CBN8PR17MB27372595A957D7912CEE184FBF6F9=40BN8PR17?=
 =?utf-8?q?MB2737=2Enamprd17=2Eprod=2Eoutlook=2Ecom=3E?=
 <CAM9Ku=FZ5RGMvw3VzNrZz [email protected]>
 =?utf-8?q?=3CBN8PR17MB27371C71A65834531DF028BBBF6F9=40BN8PR17MB2737=2Enampr?=
 =?utf-8?q?d17=2Eprod=2Eoutlook=2Ecom=3E_=3CBN8PR17MB27377F609B669D0E72638D6?=
 =?utf-8?q?9BF6F9=40BN8PR17MB2737=2Enamprd17=2Eprod=2Eoutlook=2Ecom=3E?=
 <CAM9Ku=E1wmpj=AMRhsh-Sk1RHqmK_x-J5ey8szVehefYQvn13w@mail.gmail.com>

Am I encoding the References properly? Am I decoding the References I read from IMAP properly?

CodePudding user response：

thanx to everyone for giving me direction, especially tripleee with "That's bog-standard RFC2047 encoding." which led me the email.header library.

so,,, when i send the email via gmail smtp, i could set the 6th line to simply:

msg.add_header('References', ' '.join(reference_ids))

where reference_ids is a python list of straight address strings, like:

['<BN8PR17MB27372595A957D7912CEE184FBF6F9@BN8PR17MB2737.namprd17.prod.outlook.com>', '<CAM9Ku=FZ5RGMvw3VzNrZz [email protected]>', '<BN8PR17MB27371C71A65834531DF028BBBF6F9@BN8PR17MB2737.namprd17.prod.outlook.com>', '<BN8PR17MB27377F609B669D0E72638D69BF6F9@BN8PR17MB2737.namprd17.prod.outlook.com>', '<CAM9Ku=E1wmpj=AMRhsh-Sk1RHqmK_x-J5ey8szVehefYQvn13w@mail.gmail.com>']

where gmail would show the list under "Show original" as a long string line with space delimited separation.

the problem comes in when you attempt to read it via imaplib where it is encoded via RFC2047. so upon reading the email header, i processed it like:

import imaplib, smtplib, email
from email.header import decode_header, make_header
...
emsg = email.message_from_bytes(raw_email_response_body[1])
...
References = emsg['References']
if References:
    References = make_header(decode_header(References))
    References = str(References).strip().replace('"', '').replace('\r', '').replace('\t', '').replace('\n', ' ').replace(' '*3, ' '*2).replace(' '*2, ' ').split(' ')

which will faithfully return the list of Message-IDs as originally sent. hopefully this helps a few people out there. and, i thank everyone for the help and direction. lucas

CodePudding user response：

You seem to have uncovered a bug in Python's email library. The References: and In-Reply-To: headers should not be subject to RFC2047 encoding at all.

As a quick and dirty demonstration, I can avoid the problem by shortening the long ones.

As a similarly quick and dirty workaround, you can override the email.policy object with a different one which doesn't force these lines to be shortened. This is a slightly obscure corner of the Python email library; the documentation really presupposes a fair amount of prior knowledge of both email in general and Python's email library in particular.

from email.policy import default

... 

custom_policy = default.clone(max_line_length=100)

msg = email.message.EmailMessage(policy=custom_policy)

...
# in your IMAP reader
copy = message_from_bytes(imap_response[1], policy=custom_policy)

Notice, however, that this sets the maximum line length everywhere, not just in the headers. You'd really want to be able to override this setting just for the duration of the addition of these specific headers.

Notice also that message_from_bytes needs a policy= keyword argument to construct a modern EmailMessage object. Without the keyword, you end up creating a legacy email.message.Message object which lacks several of the modern methods of the 3.3 email API.

Here's a demo: https://ideone.com/eEzIxe