I have the following code:
pool = Pool(cpu_count())
pool.imap(process_item, items, chunksize=100)
In the process_item()
function I am using structures which are resource demanding to create, but it would be reusable. (but not concurrently shareable) Currently within each call of process_item()
it creates the resource in a local variable repeatedly. It would be great performance benefit to create once (for each worker) then reuse
Question
How to have delegated cpu_count()
instances for those resource, and how to implement the process_item()
function to access the appropriate delegated instance belonging that particular worker?
CodePudding user response:
If you cannot use anything outside the standard library, I would suggest using using an initializer
when creating the pool:
from multiprocessing import Pool, Manager, Process
import os
import random
class A:
def __init__(self):
self.var = random.randint(0, 1000)
def get(self):
print(self.var, os.getpid())
def worker(some_arg):
global expensive_var
expensive_var.get()
def initializer(*args):
global expensive_var
expensive_var = A()
if __name__ == "__main__":
pool = Pool(8, initializer=initializer, initargs=())
for result in pool.imap(worker, range(100)):
continue
Create your local variables inside the initializer
, and make them global. Then you can use them inside the function you are passing to the pool. This works because the initializer
is executed in when each process of the pool starts. So making them global
would make it a global variable in the scope of the child process only, allowing access to it during execution of the function you passed to the pool.
There was a stackoverflow answer that explained all this better, but I can't seem to find it for now. But this is basically the gist of it.