Python execute a function in parallel in loop-CodePudding

I tried to improve the execution time of a script which import datas from CSV into Graphite/Go-Carbon DB time series.

this is the loop which parse all zipfiles and read them in function (execute_run) : It tried this code but i got an error:

    for idx4, Lst_f in enumerate(full_csvfile_paths):
       if lst_metrics in Lst_f:
          zip_file = Lst_f
          with zipfile.ZipFile(zip_file) as zipobj:
             print("Using ZipFile:",zipobj.filename)
             #execute_run(zipobj.filename, confcsv_path, storage_type, serial)
             output = subprocess.run(execute_run(zipobj.filename, confcsv_path, storage_type, serial),stdout=subprocess.PIPE)
             print ("Return code: %i" % output.returncode)
             print ("Output data: %s" % output.stdout)

Error:

Traceback (most recent call last):
  File "./02-pickle-client.py", line 451, in <module>
    main()
  File "./02-pickle-client.py", line 361, in main
    output = subprocess.run(execute_run(zipobj.filename, confcsv_path, storage_type, serial),stdout=subprocess.PIPE)
  File "/usr/lib64/python3.6/subprocess.py", line 423, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib64/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib64/python3.6/subprocess.py", line 1240, in _execute_child
    args = list(args)
TypeError: 'NoneType' object is not iterable

Is there a way to execute X times the function :"execute_run" and control the correct running.

Many thanks for help.

CodePudding user response：

The problem could be that the parallel processes is not set up to handle iterables correctly. Instead of subprocess.run, I would recommend using multiprocessing.pool or multiprocessing.starmap as specified in these docs.

This could look something like this:

    import multiprocessing as mp

    # Step 1: Use multiprocessing.Pool() and specify number of cores to use (here I use 4).
    pool = mp.Pool(4)

    # Step 2: Use pool.starmap which takes a multiple iterable arguments
    results = pool.starmap(My_Function, [(variable1,variable2,variable3) for i in data])
    
    # Step 3: Don't forget to close
    pool.close()