Home > Back-end >  Reading lines of file into xargs for parallel python script
Reading lines of file into xargs for parallel python script

Time:11-04

I am trying to execute a python script in parallel by using the lines of a file as arguments to a python script. The file is named experiments.txt and might look like this:

--x_timesteps 3 --y_timesteps 3 --exp_path ./logs
--x_timesteps 4 --y_timesteps 3 --exp_path ./logs
--x_timesteps 5 --y_timesteps 3 --exp_path ./logs
--x_timesteps 6 --y_timesteps 3 --exp_path ./logs

I want to speed up the processing by using xargs; however, I don't know how to do this with file input. How can I parallelize a python script by reading line by line from the file and piping to xargs?

I know I can solve this problem with a simple for-loop; however, I need to know how to do it with file input.

Typing this into the command line in the appropriate directory,

for x in {3..6}; \
do printf '%s\0' "--x_timesteps=${x}" "--y_timesteps=3" "--exp_path=./logs"; \
done | xargs -0 -n 3 -P 8 python script.py

The for-loop style parallelization is derived from the answer to "Using xargs for parallel Python scripts"

CodePudding user response:

IMHO, it is simpler and more controllable with GNU Parallel like this:

parallel --dry-run --colsep ' ' python script.py :::: experiments.txt

You can simply add or remove --dry-run to debug. You can add --eta or --bar for progress reports. You can distribute tasks across multiple hosts. You can easily do fail/retry processing. You can extract basenames, filenames, directory names from parameters. You can do permutations of parameters. And so on.

  • Related