Home > OS >  `concurrent.futures.ProcessPoolExecutor` on Python is ran from beginning of file instead of the defi
`concurrent.futures.ProcessPoolExecutor` on Python is ran from beginning of file instead of the defi

Time:07-12

I have a trouble with concurrent.futures. For the short background, I was trying to do a massive image manipulation with python-opencv2. I stumbled upon performance issue, which is a pain considering it can take hours to process only hundreds of image. I found a solution by using concurrent.futures to utilize CPU multicores to make the process go faster (because I noticed while it took really long time to process, it only use like 16% of my 6-core processor, which is roughly a single-core). So I created the code but then I noticed that the multiprocessing actually start from the beginning of the code instead of isolated around the function I just created. Here's the minimal working reproduction of the error:

import glob
import concurrent.futures
import cv2
import os

def convert_this(filename):
    ### Read in the image data
    img = cv2.imread(filename)
    
    ### Resize the image
    res = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    res.save("output/" filename)

try:
    #create output dir
    os.mkdir("output")
    with concurrent.futures.ProcessPoolExecutor() as executor:
        files = glob.glob("../project/temp/")
        executor.map(convert_this, files)
except Exception as e:
    print("Encountered Error!")
    print(e)
    filelist = glob.glob("output")
    for f in filelist:
        os.remove(f)
    os.rmdir("output")

It gave me an error:

Encountered Error!
Encountered Error!
[WinError 183] Cannot create a file when that file already exists: 'output'
Traceback (most recent call last):
  File "M:\pythonproject\testfolder\test.py", line 17, in <module>
    os.mkdir("output")
[WinError 183] Cannot create a file when that file already exists: 'output'
Encountered Error!
[WinError 183] Cannot create a file when that file already exists: 'output'
Traceback (most recent call last):
  File "M:\pythonproject\testfolder\test.py", line 17, in <module>
    os.mkdir("output")
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'output'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\<username>\Anaconda3\envs\py37\lib\multiprocessing\spawn.py", line 105, in spawn_main
Encountered Error!
[WinError 183] Cannot create a file when that file already exists: 'output'
Traceback (most recent call last):
  File "M:\pythonproject\testfolder\test.py", line 17, in <module>
    os.mkdir("output")
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'output'
...
(it was repeating errors of the same "can't create file")

As you see, the os.mkdir was ran even though it's outside of the convert_this function I just defined. I'm not that new to Python but definitely new in multiprocessing and threading. Is this just how concurrent.futures behaves? Or am I missing some documentation reading?

Thanks.

CodePudding user response:

Yes, multiprocessing must load the file in the new processes before it can run the function (just as it does when you run the file yourself), so it runs all code you have written. So, either (1) move your multiprocessing code to a separate file with nothing extra in it and call that, or (2) enclose your top level code in a function (e.g., main()), and at the bottom of your file write

If __name__ == ”__main__":
    main()

This code will only be run when you start the script, but not by the multiprocess-spawned version. See Python docs for details on this construction.

  • Related