I am trying to execute a python script in parallel by using the lines of a file as arguments to a python script. The file is named experiments.txt
and might look like this:
--x_timesteps 3 --y_timesteps 3 --exp_path ./logs
--x_timesteps 4 --y_timesteps 3 --exp_path ./logs
--x_timesteps 5 --y_timesteps 3 --exp_path ./logs
--x_timesteps 6 --y_timesteps 3 --exp_path ./logs
I want to speed up the processing by using xargs; however, I don't know how to do this with file input. How can I parallelize a python script by reading line by line from the file and piping to xargs?
I know I can solve this problem with a simple for-loop; however, I need to know how to do it with file input.
Typing this into the command line in the appropriate directory,
for x in {3..6}; \
do printf '%s\0' "--x_timesteps=${x}" "--y_timesteps=3" "--exp_path=./logs"; \
done | xargs -0 -n 3 -P 8 python script.py
The for-loop style parallelization is derived from the answer to "Using xargs for parallel Python scripts"
CodePudding user response:
IMHO, it is simpler and more controllable with GNU Parallel like this:
parallel --dry-run --colsep ' ' python script.py :::: experiments.txt
You can simply add or remove --dry-run
to debug. You can add --eta
or --bar
for progress reports. You can distribute tasks across multiple hosts. You can easily do fail/retry processing. You can extract basenames, filenames, directory names from parameters. You can do permutations of parameters. And so on.