I am using the following code to retrieve the contents of the emails.From that I could extract the details of the emails.
for i in range(messages, messages-N, -1):
# fetch the email message by ID
res, msg = imap.fetch(str(i), "(RFC822)")
for response in msg:
if isinstance(response, tuple):
# parse a bytes email into a message object
msg = email.message_from_bytes(response[1])
# decode the email subject
subject, encoding = decode_header(msg["Subject"])[0]
if isinstance(subject, bytes):
# if it's a bytes, decode to str
subject = subject.decode(encoding)
# decode email sender
From, encoding = decode_header(msg.get("From"))[0]
if isinstance(From, bytes):
From = From.decode(encoding)
Date, encoding = decode_header(msg["Date"])[0]
if isinstance(Date, bytes):
# if it's a bytes, decode to str
Date = Date.decode(encoding)
print("Subject:", subject)
print("From:", From)
print("Date:", Date)
# if the email message is multipart
if msg.is_multipart():
# iterate over email parts
for part in msg.walk():
# extract content type of email
content_type = part.get_content_type()
content_disposition = str(part.get("Content-Disposition"))
try:
# get the email body
body = part.get_payload(decode=True).decode()
except:
pass
if content_type == "text/plain" and "attachment" not in content_disposition:
# print text/plain emails and skip attachments
print(body)
elif "attachment" in content_disposition:
# download attachment
print("Subject:","This Contains an Attachement")
else:
# extract content type of email
content_type = msg.get_content_type()
# get the email body
body = msg.get_payload(decode=True).decode()
if content_type == "text/plain":
# print only text email parts
print(body)
if content_type == "text/html":
print("Content Type is HTML")
print("="*100)
But I need to retrieve the,
print("Subject:", subject)
print("From:", From)
print("Date:", Date)
In to a data frame. How should I improve this code? I need the whole output list to be in a data frame.
CodePudding user response:
If you can use a temporary file or any kind of storage you can write the results to that storage and use the result to get the desired dataframe.
If the email amount we are talking about is really small you don't have to optimize anything and just concatenate every single row to a dataframe but this is bad practice and if possible you should avoid this. When the email amount is large this would cause all kinds of problems. First of all it would be extremely slow. Write the results to a csv or a sql based database and you'll thank yourself later.
CodePudding user response:
I guess I understand your problem, subject
is a str and not a list. You have to learn how to collect the data in a list and pass this list to the pandas DataFrame. Then you can follow my comment and create a DataFrame using pandas. Please check the rough prototype below:
import pandas as pd
subject_list = []
from_list = []
date_list = []
for i in range(...):
# replace prints with list.append()
# print("Subject:", subject)
# print("From:", From)
# print("Date:", Date)
subject_list.append(subject)
from_list.append(From)
date_list.append(Date)
df = pd.DataFrame({"Subject": subject_list, "From": from_list, "Date": date_list})
print(df)