I've not used multiprocessing/multithread in python code.
I have a long code (Over 600 lines) and I need to run it with using multiple cpus.
So I saw the way to use mutiprocessing/thread, but the way to do it for a whole code could not find it.
The code has the form of..
- for loop
- read csv
- do several preprocessing
- Mean of the values
- Compare it with other values ...
If I have to edit all the code for the multiprocessing, it would take many times so could you please let me know if you know the way to do mutiprocessing the whole code?
CodePudding user response:
To parallelize a function on multiple CPU cores it must generally avoid mutating global state and each function call must be independent of the others. Consider this hypothetical fonction which respects the conditions (comparison with other values is removed):
def f(file: Path) -> Value:
data = read_csv(file)
processed = pre_processing(data)
return mean(processed)
You can easily multithread it with Python using the concurrent integrated package:
from concurrent.futures import ThreadPoolExecutor
files = ["/path/1/", ...] # List of files
with ThreadPoolExecutor() as executor:
values = executor.map(f, files)
# Compare values here
for value in values:
...
You can also use ProcessPoolExecutor
for multiprocessing.