Home > Blockchain >  manager.dict() "skipping" to update some values in multiprocessing ~ Python
manager.dict() "skipping" to update some values in multiprocessing ~ Python

Time:03-04

In multiprocessing, I wanted to update manager.dict(), it's being updated... But some data are getting skipped while updating? What can be done? It's something similar to this...

from multiprocessing import Process, Manager

manager = Manager()
a = manager.dict()
a['url_info'] = manager.list()


def parse_link(link):
    # parse link, pared_info returns dict
    pared_info = link_parser(link)
    a['url_info'].append(pared_info)

# Links contains a lot of url that needs to be parsed.
links = ["https://url.com/1","https://url.com/2", "https://url.com/3"]


processes = []

for link in links:
    p = Process(target=parse_link, args=link,))
    p.start()
    processes.append(p)

for process in processes:
    process.join()

link_parser() is a function that returns a dictionary, which contains the information about the scraped/parsed webpage.

> print(list(a['url_info']))
> ['#info_1', '#info_3']

Here the multiprocessing program skipped updating #info_2 in the list (aka Array). Help me please

CodePudding user response:

Here's some code that demonstrates an improved structure for what you're trying to do.

Obviously it doesn't have the detail of your link_parser() but you'll get the point.

from concurrent.futures import ProcessPoolExecutor
from multiprocessing import Manager
from functools import partial

LINKS = ['abc', 'def', 'ghi']
KEY = 'url_info'


def parse_link(a, link):
    a[KEY].append(link)


def main():
    with Manager() as manager:
        a = manager.dict()
        a[KEY] = manager.list()
        with ProcessPoolExecutor() as executor:
            executor.map(partial(parse_link, a), LINKS)
        print(a[KEY])


if __name__ == '__main__':
    main()

Output:

['abc', 'def', 'ghi']
  • Related