I am not familiar with python, but would like to run a function to read and write multiple files in parallel in python. For a minimal example:
from multiprocessing import Pool
import pandas as pd
def multiple(input_path, output_path, n):
df = pd.read_csv(input_path, index_col=0)
new_df = df.multiply(n)
new_df.to_csv(output_path)
workers = 6
input_filenames = [f'input_i.csv' for i in range(1,11)]
output_filenames = [f'output_i.csv' for i in range(1,11)]
with Pool(workers) as pool:
pool.map(multiple, ...)
If I am using for
loop, I can do this like below:
for i, input_file in enumerate(input_filenames):
input_path = input_filenames[i]
output_path = output_filenames[i]
multiple(input_path, output_path, 2)
How should I convert into pool.map
to match the index of each input and output filenames and also feed 3 arguments to the function (input_path
, output_path
, n
)?
Thank you!
CodePudding user response:
You can zip three lists input_filenames
/output_filenames
/n
to one list, and then you give the function multiple
and that list to pool.map
. It will take each element from the list, and a worker will execute multiple
on it.
def multiple(inp_out_n):
input_path, output_path, n = inp_out_n
df = pd.read_csv(input_path, index_col=0)
new_df = df.multiply(n)
new_df.to_csv(output_path)
workers = 6
input_filenames = [f'input_{i}.csv' for i in range(1,11)]
input_filenames = [f'output_{i}.csv' for i in range(1,11)]
n = 10 * [2]
in_out_n = zip(input_filenames, input_filenames, n)
pool = multiprocessing.Pool(processes=workers)
pool.map(multiple, in_out_n)
pool.close()
CodePudding user response:
Instead of using a pool you can create a Process
, point it to the multiple
method, and give it the filenames as arguments.
from multiprocessing import Process
processes = []
for i, input_file in enumerate(input_filenames):
input_path = input_filenames[i]
output_path = output_filenames[i]
multiple(input_path, output_path, 2)
process = Process(target=multiple, args=(input_path, output_path, 2,))
processs.append(process)
process.start()
for processs in processes:
p.join()