global list variable in multiprocessing pool is returning as empty in python?-CodePudding

I have a declared an empty list as global variable. and I assign some value to it. but when I try to do multiprocessing pool the global variable is empty list instead of the assigned value.

from multiprocessing import Process, Pool
camera_data = [{"id": "1", "url": "cam-c.jpg", "area": "1"},
                       {"id": "2", "url": "cam-d.jpg", "area": "1"},
                       {"id": "3", "url": "cam-e.jpg", "area": "2"},
                       {"id": "4", "url": "cam-f.jpg", "area": "2"}]

bulb_data = []


def framed_images(fake):
    print(bulb_data)


if __name__ == '__main__':
    print("camera_data - ",camera_data)
    for data in camera_data:
        bulb_data.append({"url": data["url"], "bulb": False})
    print("bulb_data - ",bulb_data)
    # framed_images()
    fake_Data = bulb_data

    with Pool(processes=4) as pool:
        pool.map(framed_images, fake_Data)

I am getting output as:

camera_data - [{'id': '1', 'url': 'cam-c.jpg', 'area': '1'}, {'id': '2', 'url': 'cam-d.jpg', 'area': '1'}, {'id': '3', 'url': 'cam-e.jpg', 'area': '2'}, {'id': '4', 'url': 'cam-f.jpg', 'area': '2'}]

bulb_data - [{'url': 'cam-c.jpg', 'bulb': False}, {'url': 'cam-d.jpg', 'bulb': False}, {'url': 'cam-e.jpg', 'bulb': False}, {'url': 'cam-f.jpg', 'bulb': False}]
[]
[]
[]
[]

the last four empty list is from multiprocessing pool. I expect an output like this:

[{'url': 'cam-c.jpg', 'bulb': False}, {'url': 'cam-d.jpg', 'bulb': False}, {'url': 'cam-e.jpg', 'bulb': False}, {'url': 'cam-f.jpg', 'bulb': False}]
[{'url': 'cam-c.jpg', 'bulb': False}, {'url': 'cam-d.jpg', 'bulb': False}, {'url': 'cam-e.jpg', 'bulb': False}, {'url': 'cam-f.jpg', 'bulb': False}]
[{'url': 'cam-c.jpg', 'bulb': False}, {'url': 'cam-d.jpg', 'bulb': False}, {'url': 'cam-e.jpg', 'bulb': False}, {'url': 'cam-f.jpg', 'bulb': False}]
[{'url': 'cam-c.jpg', 'bulb': False}, {'url': 'cam-d.jpg', 'bulb': False}, {'url': 'cam-e.jpg', 'bulb': False}, {'url': 'cam-f.jpg', 'bulb': False}]

in order to edit the list of dictionary in each process while updating in global variable.

CodePudding user response：

It is likely because you are using the spawn start method, which when used the Process' only inherits the bear minimum from the process that spawned them. This is the default start method on MacOS and Windows, and it's the only start method for Windows OS.

You can read the documentation here on the different start methods.

The documentation also points this out about global variables when using the spawn or forkserver start methods:

Bear in mind that if code run in a child process tries to access a global variable, then the value it sees (if any) may not be the same as the value in the parent process at the time that Process.start was called. However, global variables which are just module level constants cause no problems.

https://docs.python.org/3/library/multiprocessing.html?highlight=multiprocessing#the-spawn-and-forkserver-start-methods

CodePudding user response：

the input argument(the iterable) is mandatory and u wanted to achieve that by a trick with a non-used input in framed_images and a global list that leads to that output.
the reason behind that is, the multi-process pool creates sub-processes for that function and divides the argument(here is a list called fake_Data) related to the chunk-size parameter per sub-processes and there is no shared memory.
you can reference the object like this, however i never tried. https://docs.python.org/3/library/multiprocessing.html#shared-ctypes-objects
although, i think you can achieve that with a multi-thread module that uses shared memory. trying with sub-processes isn't what u need.

CodePudding user response：

as pointed out, on windows (or mac-os as they both use spawn) each spawned "child" has its own separate memory and imports your script so they are basically doing.

import your_script
print(your_script.bulb_data)

if you ran this code you will get the empty list, and on linux, its slightly different, and you will get your "expected result" as it uses fork but the memory is still not shared, and any modifications to one of them won't affect the other processes.

the way around this is to use a managed list that exists in other process and involves IPC to synchronize the list across processes.

from multiprocessing import Process, Pool
from multiprocessing import Manager
camera_data = [{"id": "1", "url": "cam-c.jpg", "area": "1"},
                       {"id": "2", "url": "cam-d.jpg", "area": "1"},
                       {"id": "3", "url": "cam-e.jpg", "area": "2"},
                       {"id": "4", "url": "cam-f.jpg", "area": "2"}]



def framed_images(fake):
    print(bulb_data)

def initializer_func(bulb_data_list):
    global bulb_data
    bulb_data = bulb_data_list

if __name__ == '__main__':
    manager = Manager()
    bulb_data = manager.list()
    print("camera_data - ",camera_data)
    for data in camera_data:
        bulb_data.append({"url": data["url"], "bulb": False})
    print("bulb_data - ",bulb_data)
    # framed_images()
    fake_Data = list(bulb_data)

    with Pool(processes=4, initializer=initializer_func, initargs=(bulb_data,)) as pool:
        pool.map(framed_images, fake_Data)

note that each access to this "shared list" involves IPC which is slower than normal lists, so keep its use to the minimum, so don't put a lot of objects or big objects in it. sharing states documentation

CodePudding user response：

solution code:

used shared preference and shared a the output of all child process with the main code. thus i can do whatever i want with the results

camera_data = [{"id": "1", "url": "cam-c.jpg", "area": "1"},
                       {"id": "2", "url": "cam-d.jpg", "area": "1"},
                       {"id": "3", "url": "cam-e.jpg", "area": "2"},
                       {"id": "4", "url": "cam-f.jpg", "area": "2"}]

from multiprocessing import SimpleQueue
from multiprocessing.pool import Pool


# initialize worker processes
def init_worker(shared_queue):
    global queue
    queue = shared_queue
    print(queue)


# task executed in a worker process
def task(identifier):
    global queue
    if identifier["area"] == '1':
        queue.put(("GREEN"))
    if identifier["area"] == '2':
        queue.put(("RED"))




# protect the entry point
if __name__ == '__main__':
    # create a shared queue
    shared_queue = SimpleQueue()
    # create and configure the process pool
    fake_data = camera_data
    with Pool(initializer=init_worker, initargs=(shared_queue,)) as pool:
        # issue tasks into the process pool
        _ = pool.map_async(task, fake_data)
        for i in enumerate(fake_data):
            result = shared_queue.get()
            print(f'Got {result}', flush=True)