I tried to improve the execution time of a script which import datas from CSV into Graphite/Go-Carbon DB time series.
this is the loop which parse all zipfiles and read them in function (execute_run) : It tried this code but i got an error:
for idx4, Lst_f in enumerate(full_csvfile_paths):
if lst_metrics in Lst_f:
zip_file = Lst_f
with zipfile.ZipFile(zip_file) as zipobj:
print("Using ZipFile:",zipobj.filename)
#execute_run(zipobj.filename, confcsv_path, storage_type, serial)
output = subprocess.run(execute_run(zipobj.filename, confcsv_path, storage_type, serial),stdout=subprocess.PIPE)
print ("Return code: %i" % output.returncode)
print ("Output data: %s" % output.stdout)
Error:
Traceback (most recent call last):
File "./02-pickle-client.py", line 451, in <module>
main()
File "./02-pickle-client.py", line 361, in main
output = subprocess.run(execute_run(zipobj.filename, confcsv_path, storage_type, serial),stdout=subprocess.PIPE)
File "/usr/lib64/python3.6/subprocess.py", line 423, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib64/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib64/python3.6/subprocess.py", line 1240, in _execute_child
args = list(args)
TypeError: 'NoneType' object is not iterable
Is there a way to execute X times the function :"execute_run" and control the correct running.
Many thanks for help.
CodePudding user response:
The problem could be that the parallel processes is not set up to handle iterables correctly. Instead of subprocess.run
, I would recommend using
multiprocessing.pool
or multiprocessing.starmap
as specified in these docs.
This could look something like this:
import multiprocessing as mp
# Step 1: Use multiprocessing.Pool() and specify number of cores to use (here I use 4).
pool = mp.Pool(4)
# Step 2: Use pool.starmap which takes a multiple iterable arguments
results = pool.starmap(My_Function, [(variable1,variable2,variable3) for i in data])
# Step 3: Don't forget to close
pool.close()