Home > Net >  Multithreading vs. Multiprocessing with OpenCV in Python
Multithreading vs. Multiprocessing with OpenCV in Python

Time:12-07

I have the following function:

def Upscale(path_to_image):
    img = cv2.imread(path_to_image)
    sr = cv2.dnn_superres.DnnSuperResImpl_create()
    path = 'LapSRN_x8.pb' 
    sr.readModel(path)
    sr.setModel('lapsrn',8)
    result = sr.upsample(img)
    cv2.imwrite(f'C:\\Users\\user\\Desktop\\PyStuff\\images\\{path_to_image}_resized.png',result)
    return result 

This function takes in an image, upscales the image, and then writes the image to a folder and returns the resulting image array. There is a list of PNG image paths, which is of the following form:

file_list = ['page-1.png','page-2.png',...etc]

I attempted to use multithreading to make the process faster, as each image takes 95 sec to complete (there are hundreds of images). The code for this is the following:

import tqdm
from concurrent.futures import ThreadPoolExecutor, as_completed
res=[]

with ThreadPoolExecutor(max_workers=10) as executor:
    future_to_response = {
        executor.submit(Upscale, f'C:\\Users\\rturedi\\Desktop\\DPI_proj\\images\\{i}'): i for i in file_list
    }
    t = tqdm.tqdm(total=len(future_to_response))

    for future in as_completed(future_to_response):
        res.append(future.result())

for i in range(len(res)):
    cv2.imwrite(f'{i}.png',res[i])

The above code multithreads the process, and then has a simple loop to run through the list I am appending to in order to store the images. This process takes just as long as it would take to run each image consecutively (the multithreading does not make the process faster).

I attempted to fix this by instead using multiprocessing, and the code for this is as follows:

import multiprocessing
res = []
for i in file_list:
    p = multiprocessing.Process(target=Upscale((f'C:\\Users\\rturedi\\Desktop\\DPI_proj\\images\\{i}')))
    res.append(p)
    p.start()

However, this takes just as long (does not decrease the time it takes to compute each image individually), and furthermore, the output of my res array is not a list of arrays, rather it is:

 [<Process name='Process-54' pid=28028 parent=27048 stopped exitcode=1>,
 <Process name='Process-55' pid=18272 parent=27048 stopped exitcode=1>,
 <Process name='Process-56' pid=23116 parent=27048 stopped exitcode=1>,
 <Process name='Process-57' pid=5536 parent=27048 stopped exitcode=1>,
 <Process name='Process-58' pid=14496 parent=27048 stopped exitcode=1>,
 <Process name='Process-59' pid=16964 parent=27048 stopped exitcode=1>,
 <Process name='Process-60' pid=14832 parent=27048 stopped exitcode=1>,
 <Process name='Process-61' pid=19584 parent=27048 stopped exitcode=1>,
 <Process name='Process-62' pid=20244 parent=27048 stopped exitcode=1>,
 <Process name='Process-63' pid=28768 parent=27048 stopped exitcode=1>,
 <Process name='Process-64' pid=16164 parent=27048 stopped exitcode=1>,
 <Process name='Process-65' pid=21196 parent=27048 stopped exitcode=1>]

Does anyone have an idea of how I can accomplish making this process faster with either multithreading or multiprocessing?

CodePudding user response:

I think multiprocessing will work when you are running the function for multiple images at the same time.

Right now you are using a for loop and passing one image at a time to a single process.

Try the following approach and compare the time this takes with your original approach.

if __name__ == '__main__':
    p1 = Process(target=Upscale,args(file_list[0:len(file_list)//2]))
    p1.start()
    p2 = Process(target=Upscale,args(file_list[len(file_list)//2:len(file_list)]))
    p2.start()
    p1.join()
    p2.join()

Inside the Upscale function, make a for loop and loop through the images, and perform the same tasks as before.

Also, note that the function's argument is mentioned separately in "args" of the function "multiprocessing.Process".

P.S This is my first time answering, let me know if something is not clear.

For the output, you can try the following approach, I picked it up online through this https://superfastpython.com/multiprocessing-return-value-from-process/

# example of returning a variable from a process using a value
from random import random
from time import sleep
from multiprocessing import Value
from multiprocessing import Process

# function to execute in a child process
def task(variable):
    # generate some data
    data = random()
    # block, to simulate computational effort
    print(f'Generated {data}', flush=True)
    sleep(data)
    # return data via value
    variable.value = data

# protect the entry point
if __name__ == '__main__':
    # create shared variable
    variable = Value('f', 0.0)
    # create a child process process
    process = Process(target=task, args=(variable,))
    # start the process
    process.start()
    # wait for the process to finish
    process.join()
    # report return value
    print(f'Returned: {variable.value}')

CodePudding user response:

Multithreading / multiprocessing can be inefficient or even counterproductive when used to optimise the execution of neural networks on CPUs since a network execution may already be multithreaded.

Before any optimisation work you should first profile your code in order to identify what is slow and what can be improved. Here is some relevant questions you can ask yourself during this optimisation process:

  • Does executing a single network already max out all the CPU's?
  • Is it possible to execute the network on a faster accelerator such as the GPU?
  • In the first place, is your network optimised (FP16, vectorisation, etc.)?
  • Is OpenCV up to date and compiled to take advantage of all the available compute power?
  • What is the memory footprint of a single network and how many could you execute at the same time on the CPU without out of memory errors?
  • Is there any unnecessary redundant allocation (such as the one I pointed out in another of your question)?
  • Are there obvious operations that would greatly benefit from being run concurrently (such as imwrite and imread being memory-bound while upscale is compute-bound)?

This list is not exhaustive, try to search for methods to profile, benchmark and optimise neural networks executions.

  • Related