I get emails with unique subjects, and I want to save them. I tried this (stage with credentials input is omitted)
import email
import imaplib
suka.select('Inbox')
key = 'FROM'
value = 'TBD'
_, data = suka.search(None, key, value)
mail_id_list = data[0].split()
msgs = []
for num in mail_id_list:
typ, data = suka.fetch(num, '(RFC822)')
msgs.append(data)
for msg in msgs[::-1]:
for response_part in msg:
if type(response_part) is tuple:
my_msg=email.message_from_bytes((response_part[1]))
print ("subj:", my_msg['subject'])
for part in my_msg.walk():
#print(part.get_content_type())
if part.get_content_type() == 'text/plain':
print (part.get_payload())
I do get the subjects, but in a form of "subj: =?utf-8?B?0LfQsNGP0LLQutCwIDIxXzE0MTIyMg==?="
. Thus, a decoding is required. The secret seems to be, which variable needs to be decoded?
Also tried the other way:
yek, do = suka.uid('fetch', govno,('RFC822'))
, where govno
is the latest email in the inbox. The output is "can't concat int to bytes".
Thus, is there a way to decode the subjects as they appear in the email client? Thank you.
CodePudding user response:
There is a built-in decode_header() method.
Decode a message header value without converting the character set. The header value is in header.
This function returns a list of (decoded_string, charset) pairs containing each of the decoded parts of the header. charset is None for non-encoded parts of the header, otherwise a lower case string containing the name of the character set specified in the encoded string.
>>> from email.header import decode_header
>>> decoded_headers = decode_header("=?utf-8?B?0LfQsNGP0LLQutCwIDIxXzE0MTIyMg==?=")
>>> decoded_headers
[(b'\xd0\xb7\xd0\xb0\xd1\x8f\xd0\xb2\xd0\xba\xd0\xb0 21_141222', 'utf-8')]
>>> first = decoded_headers[0]
>>> first[0].decode(first[1])
'заявка 21_141222'
You can decode
the actual value returned by decode_header
using the charset returned by it.
For follow-up question, here's a helper function to get the header value in case of multiline header value which handlers errors -
from email.header import decode_header
def get_header(header_text, default='utf8'):
try:
headers = decode_header(header_text)
except:
print('Error while decoding header, using the header without decoding')
return header_text
header_sections = []
for text, charset in headers:
try:
# if charset does not exist, try to decode with utf-8
header_section = text.decode(charset or 'utf-8')
except:
# if we fail to decode the text(header is incorrectly encoded)
# Try to do decode, and ignore the decoding errors
header_section = text.decode('utf-8', errors='ignore')
if header_section:
header_sections.append(header_section)
return ' '.join(header_sections)
print(get_header("=?utf-8?B?0LfQsNGP0LLQutCwIDIxXzE0MTIyMg==?="))