I have a trouble with concurrent.futures
. For the short background, I was trying to do a massive image manipulation with python-opencv2
. I stumbled upon performance issue, which is a pain considering it can take hours to process only hundreds of image. I found a solution by using concurrent.futures
to utilize CPU multicores to make the process go faster (because I noticed while it took really long time to process, it only use like 16% of my 6-core processor, which is roughly a single-core). So I created the code but then I noticed that the multiprocessing actually start from the beginning of the code instead of isolated around the function I just created. Here's the minimal working reproduction of the error:
import glob
import concurrent.futures
import cv2
import os
def convert_this(filename):
### Read in the image data
img = cv2.imread(filename)
### Resize the image
res = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
res.save("output/" filename)
try:
#create output dir
os.mkdir("output")
with concurrent.futures.ProcessPoolExecutor() as executor:
files = glob.glob("../project/temp/")
executor.map(convert_this, files)
except Exception as e:
print("Encountered Error!")
print(e)
filelist = glob.glob("output")
for f in filelist:
os.remove(f)
os.rmdir("output")
It gave me an error:
Encountered Error!
Encountered Error!
[WinError 183] Cannot create a file when that file already exists: 'output'
Traceback (most recent call last):
File "M:\pythonproject\testfolder\test.py", line 17, in <module>
os.mkdir("output")
[WinError 183] Cannot create a file when that file already exists: 'output'
Encountered Error!
[WinError 183] Cannot create a file when that file already exists: 'output'
Traceback (most recent call last):
File "M:\pythonproject\testfolder\test.py", line 17, in <module>
os.mkdir("output")
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'output'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\<username>\Anaconda3\envs\py37\lib\multiprocessing\spawn.py", line 105, in spawn_main
Encountered Error!
[WinError 183] Cannot create a file when that file already exists: 'output'
Traceback (most recent call last):
File "M:\pythonproject\testfolder\test.py", line 17, in <module>
os.mkdir("output")
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'output'
...
(it was repeating errors of the same "can't create file")
As you see, the os.mkdir
was ran even though it's outside of the convert_this
function I just defined. I'm not that new to Python but definitely new in multiprocessing and threading. Is this just how concurrent.futures
behaves? Or am I missing some documentation reading?
Thanks.
CodePudding user response:
Yes, multiprocessing must load the file in the new processes before it can run the function (just as it does when you run the file yourself), so it runs all code you have written. So, either (1) move your multiprocessing code to a separate file with nothing extra in it and call that, or (2) enclose your top level code in a function (e.g., main()
), and at the bottom of your file write
If __name__ == ”__main__":
main()
This code will only be run when you start the script, but not by the multiprocess-spawned version. See Python docs for details on this construction.