Home > Mobile >  python multiprocessing - Turning a nested Manager().list() into a nested Python list
python multiprocessing - Turning a nested Manager().list() into a nested Python list

Time:09-21

I created a Manager list of lists to share between Processes so it updates correctly, but I don't know how to transform it into a Python list of lists afterwards to access it:

myList = Manager().list([Manager().list()])
p = Pool(processes=30)
p.apply_async(update_list, args=(myList))
p.close()
p.join()

myList = ?

I am aware of this method to transform a Manager list into a Python list, but need help on figuring out how to apply it to a nested list:

myList = Manager().list()
p = Pool(processes=30)
p.apply_async(update_list, args=(myList))
p.close()
p.join()

myList = list(myList)

EDIT: @Grismar suggested using myList = [list(sub) for sub in myList] but this minimal, reproducible code throws an FileNotFoundError: [Errno 2] No such file or directory error on my end:

from multiprocessing import Pool, Manager  
def update_list(myList):     
     myList.append(['test1','test2'])  
myList = Manager().list([Manager().list()]) 
p = Pool(processes=30) 
p.apply_async(update_list, args=(myList)) 
p.close() 
p.join()  
myList = [list(sub) for sub in myList]

CodePudding user response:

This program works (Win10, python3.10 on command line):

from multiprocessing import Pool, Manager

def update_list(x):
    print(x)
    x.append(['test1','test2'])
    print("After", x)
    
if __name__ == "__main__":
    myList = Manager().list()
    print(myList)
    with Pool(processes=30) as p:
        r = p.apply_async(update_list, args=(myList,))
        r.wait()
    print(myList)
    myList = [list(sub) for sub in myList]

The output is:

[]
[]
After [['test1', 'test2']]
[['test1', 'test2']]
[['test1', 'test2']]

I made some changes:

The apply_async function returns an AsyncResult object. You must wait until the result is available by calling .wait() on this object, otherwise your script will end before the myList list is updated.

I don't see the purpose of initializing myList with a list consisting of another Manager().list() object. I don't see how that embedded Manager().list() object will get updated, and I think that's the source of your FileNotFound error. I see that you intend to put other lists inside of myList, but you don't have to do anything special because of that. It's just standard Python to put a list inside of a list.

When writing a multiprocessing script, global objects should be created inside of a if __name__ == "__main__" block.

You missed a comma in the apply_async call. The argument list needs to be tuple, which requires a comma if there's only one item.

If you use a context manager when creating a Pool, it will clean up the Pool resources for you.

I changed the argument to "x" in update_list - not actually necessary but it's hard to read code when the same variable name is used again and again.

I think you have already figured out how to handle the nested list result, which is not a multiprocessing problem.

CodePudding user response:

There are numerous errors in your code, many already pointed out by other answers and comments here, so I am not going to repeat them. Now for your FileNotFoundError, this happens because your nested manager list is garbage collected the moment it is created as you're not creating a reference to it. Therefore, when you try to access elements inside myList (which includes this already deleted manager list), you get an error. So to fix that, simply create a reference to the list before nesting it:

alist = Manager().list()
myList = Manager().list([alist])

Now for your main question, as others have pointed out, you can use list along with a list comprehension to convert a manager list to an actual list, but only if the manager list is only one nested level deep. For example, consider this code where the nested list is two levels deep:

if __name__ == '__main__':
    alist = Manager().list()
    blist = Manager().list([alist])

    # myList is two nested levels deep
    myList = Manager().list([blist])

    p = Pool(processes=1)
    p.apply_async(update_list, args=(myList, )).get()

    p.close()
    p.join()
    
    print('before:', myList)
    myList = [list(sub) for sub in myList]
    print('after:', myList)

This will not return the expected output:

before: [<ListProxy object, typeid 'list' at 0x1d8a7d22e50>, ['test1', 'test2']]
after: [[<ListProxy object, typeid 'list' at 0x23412e13a60>], ['test1', 'test2']]

Therefore, I prefer this more general, approach below instead, which would work for any amount of nested levels (if any) and would return the expected output even if you don't submit a managed list at all or submit a managed list without any nested lists. It works because it checks each element of the list recursively:

from multiprocessing import Pool, Manager
from multiprocessing.managers import ListProxy


def update_list(myList):
     myList.append(['test1','test2'])


def get_value(l):
    return [get_value(sub_l) if isinstance(sub_l, ListProxy) else sub_l for sub_l in l]


if __name__ == '__main__':
    alist = Manager().list()
    blist = Manager().list([alist])

    # myList is two nested levels deep
    myList = Manager().list([blist])

    p = Pool(processes=1)
    p.apply_async(update_list, args=(myList, )).get()

    p.close()
    p.join()

    print('before:', myList)
    myList = get_value(myList)
    print('after:', myList)

Output

before: [<ListProxy object, typeid 'list' at 0x226dfb72e20>, ['test1', 'test2']]
after: [[[]], ['test1', 'test2']]

If your list is long, then you make the above get_value more performant by making it request the whole list in one go and not open a connection to the manager server everytime it iterates over an element:

def get_value(l):
    l = list(l)
    return [get_value(sub_l) if isinstance(sub_l, ListProxy) else sub_l for sub_l in l]

As a sidenote, it seems that you want to create nested manager lists so that the outer list gets updated automatically if any change is made to the nested list. If that is the case, then you may want to check this answer, which outlines a way to automatically handle that without you having to manually create nested manager lists.

  • Related