Greetings for the day!
i am working on a new project to extract specific data mainly 1) Ticker 2) price 3) Date from email. now email could be of two types a) data comes in email body b) data comes as an attachment. currently i am focusing on the former, so far what i have a done i have created a sub-folder and named as Email_Snapper and set some rules so that whatever email i receive from x subjectline with gets in there.
Now my job is to retrieve specific values as mentioned above and store it in xlsx/csv file in C: drive. i have written the below code but i am literally new to regex family not sure how can this be done, any help on this would be greatly appreciated. The attached screenshot represents a sample format, we get the same ticker, price and date in different formats from different clients, i think i might have to write different functions (script) for each email.
import pyodbc
import win32com.client
import re
from datetime import datetime
EMAIL_ACCOUNT = '[email protected]'
EMAIL_SUBJ_SEARCH_STRING = 'QIC EUA Index Valuation'
EMAIL_CONTNT = {'Ticker': [], 'TickerLevel': [], 'DATE': []}
out_app = win32com.client.gencache.EnsureDispatch("Outlook.Application")
out_namespace = out_app.GetNamespace("MAPI")
# root_folder = out_namespace.Folders.Item(1)
root_folder = out_namespace.GetDefaultFolder(6)
# out_iter_folder = root_folder.Folders['Inbox'].Folders['Email_snapper']
out_iter_folder = root_folder.Folders['Email_snapper']
char_length_of_search_substring = len(EMAIL_SUBJ_SEARCH_STRING)
# Count all items in the sub-folder
item_count = out_iter_folder.Items.Count
Flag = False
cnt = 0
if out_iter_folder.Items.Count > 0:
for i in range(item_count, 0, -1):
message = out_iter_folder.Items[i]
# Find only mail items and report, note, meeting etc items
if '_MailItem' in str(type(message)):
if EMAIL_SUBJ_SEARCH_STRING in message.Subject:
Flag = True
EMAIL_CONTNT['TickerLevel'].append(
message.Body[message.Body.find(r'SRAL'):message.Body.find(r'SRAL Ticker Level')].strip())
CodePudding user response:
This can be easily done with the inbuilt 'csv' library. Export the excel file as csv, then open it in python. You can read the contents with [DictReader][2]
. Each line is treated as a dict and you can easily access any value with its column name.
Here is an example:
import csv
with open('input_file.csv', 'r') as inputfile:
input_reader = csv.DictReader(inputfile)
for line in input_reader:
print(line["SRAL Ticker Level"])
You can save the contents as csv file with [DictWriter][3]
.
CodePudding user response:
There are several aspects when dealing with the Outlook object model.
First of all, use the Find
/FindNext
or Restrict
methods of the Items class to get items that correspond to your search criteria instead of iterating over all items in the folder. Read more about them in the following articles:
- How To: Use Find and FindNext methods to retrieve Outlook mail items from a folder (C#, VB.NET)
- How To: Use Restrict method to retrieve Outlook mail items from a folder
Second, the following line of code uses the Body
property which returns a plain text string:
message.Body[message.Body.find(r'SRAL'):message.Body.find(r'SRAL Ticker Level')].strip())
Instead, I'd suggest dealing with the HTMLBody
property for extracting values more precisely. The HTMLBody property returns or sets a string representing the HTML body of the specified item. The HTMLBody
property should be an HTML syntax string. Setting the HTMLBody
property will always update the Body
property immediately.
Also you may consider using the the Word editor. The WordEditor property of the Inspector
class returns an instance of the Word Document
which represents the message body. See Chapter 17: Working with Item Bodies for more information.