I have written a program that works reasonably well, however, I am pretty sure there is a way to speed it up.
Initially, I wrote it without the threading set up from below (although that said, I have a feeling that the threading set up below is having no effect at all).
I will say up front, I am totally new to threading, processing or any performance improvements to my code.
I was hoping that someone could run their eye over the below snippet and see if there was a way that I could implement something to run parallel threads/processes etc. (in summary to speed it up or at least process more files at once)
I also am having troubles trying to get any other performance speed ups to work with the nested for loop below:
for file in files:
for IPAddress in IPAddresses:
- files - is a list of (gzipped) files
- IPAddresses - is a list of IP Addresses
if __name__ == '__main__':
files = [
'file1',
'file2',
'file3'
]
IPAddresses = [
'1.1.1.1',
'1.1.1.2',
'1.1.1.3'
]
threads = []
for file in files:
for IPAddress in IPAddresses:
t = threading.Thread(target=Search_files(file, IPAddress))
t.start()
threads.append(t)
print('file: ' file ' processed for IP Address: ' IPAddress.upper() '\n')
for thread in threads:
thread.join()
CodePudding user response:
Here is example how to use multiprocessing.Pool
alongside with itertools.product
:
import multiprocessing
from time import sleep
from itertools import product
files = ["file1", "file2", "file3"]
IPAddresses = ["1.1.1.1", "1.1.1.2", "1.1.1.3"]
def my_func(tpl):
f, ip = tpl
sleep(1)
return f"Done {f}-{ip}!"
if __name__ == "__main__":
with multiprocessing.Pool() as p:
for res in p.imap_unordered(my_func, product(files, IPAddresses)):
print(res)
Prints the results as they come in (unordered), all CPU cores should be utilized:
Done file1-1.1.1.1!
Done file1-1.1.1.2!
Done file3-1.1.1.2!
Done file2-1.1.1.3!
Done file2-1.1.1.2!
Done file2-1.1.1.1!
Done file1-1.1.1.3!
Done file3-1.1.1.3!
Done file3-1.1.1.1!