I've managed to write a piece of code (composed by multiple sources along the web, and adapted to my needs) which should do the following:
- Reads an excel file
- From column A to search the value of each cell within the subject of mails from a specific folder
- If matches (cell value equal to first 9 characters of the subject), save the attachment (each mail has only one attachment, no more, no less) with the value of cell in an "output" folder.
- If doesn't match, go to the next mail, respectively next cell value.
- In the end, display the run time (not very important, only for my knowledge)
The code actually works (tested with an email folder with only 9 emails). My problem is the run time.
The actual scope of the script is to look for 2539 values in a folder with 32700 emails and save the attachments.
I've done 2 runs as follow:
- 2539 values in 32700 emails (stopped after ~1 hour)
- 10 values in 32700 emails (stopped after ~40 minutes; in this time the script processed 4 values)
I would like to know / learn, if there a way to make the script faster, or if it's slow because it's bad written etc.
Below is my code:
from pathlib import Path
import win32com.client
import os
from datetime import datetime
import time
import openpyxl
#name of the folder created for output
output_dir = Path.cwd() / "Orders"
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
folder = outlook.Folders.Item("Shared Mailbox Name")
inbox = folder.Folders.Item("Inbox")
messages = inbox.Items
wb = openpyxl.load_workbook(r"C:\Users\TEST\Path-to-excel\FolderName\ExcelName.xlsx")
sheet = wb['Sheet1']
names=sheet['A']
for cellObj in names:
ordno = str(cellObj.value)
print(ordno)
for message in messages:
subject = message.Subject
body = message.body
attachments = message.Attachments
if str(subject)[:9] == ordno:
output_dir.mkdir(parents=True, exist_ok=True)
for attachment in attachments:
attachment.SaveAsFile(output_dir / str(attachment))
else:
pass
start = time()
print(f'Time taken to run: {time() - start} seconds')
I need to mention that I am a complete rookie in Python thus any help from the community is welcomed, especially next to some clarifications of what I did wrong and why.
I've also read some similar questions but nothing helps, or at least I don't know how to adopt the methods.
Thank you!
CodePudding user response:
Your use case is too specific for anyone to be able to recreate, and hints about performance only generic but your main problem is a combination of "O x N" and synchronous processing: currently you are processing one value, one message at a time, which includes disk IO to get the e-mail.
You can certainly improve things by creating a single list of values from the workbook. You can then use this list with a processing pool (see the Python documentation) to read multiple e-mails at once.
But things might be even better if you can use the subject to query the mail server.
If you have follow-up questions, please break them down to specific parts of the task.
CodePudding user response:
First of all, instead of iterating over all items in the folder:
for message in messages:
subject = message.Subject
And then checking whether a subject starts from the specified string or includes such string:
if str(subject)[:9] == ordno:
Instead, you need to use the Find
/FindNext
or Restrictmethods of the
Items` class where you could get collection of items that correspond to your search criteria. Read more about these methods in the following articles:
- How To: Use Find and FindNext methods to retrieve Outlook mail items from a folder (C#, VB.NET)
- How To: Use Restrict method to retrieve Outlook mail items from a folder
For example, you could use the following restriction on the collection (taken form the VBA sample):
criteria = "@SQL=" & Chr(34) & "urn:schemas:httpmail:subject" & Chr(34) & " ci_phrasematch 'question'"
See Filtering Items Using a String Comparison for more information.
Also you may find the AdvancedSearch
method of the Application
class helpful. The key benefits of using the AdvancedSearch
method in Outlook are:
- The search is performed in another thread. You don’t need to run another thread manually since the
AdvancedSearch
method runs it automatically in the background. - Possibility to search for any item types: mail, appointment, calendar, notes etc. in any location, i.e. beyond the scope of a certain folder. The
Restrict
andFind
/FindNext
methods can be applied to a particularItems
collection (see theItems
property of theFolder
class in Outlook). - Full support for DASL queries (custom properties can be used for searching too). To improve the search performance, Instant Search keywords can be used if Instant Search is enabled for the store (see the
IsInstantSearchEnabled
property of theStore
class). - You can stop the search process at any moment using the
Stop
method of theSearch
class.
See Advanced search in Outlook programmatically: C#, VB.NET for more information on that.