Home > other >  How to start multiple job instance in a shell script to process multiple files in a directory?
How to start multiple job instance in a shell script to process multiple files in a directory?

Time:05-12

#!/bin/bash

      data_dir=./all
      for file_name in "$data_dir"/*
      do
        echo "$file_name"
        python process.py "$file_name"
      done
   

For example, this script processes the files sequentially in a directory in a 'for' loop. Is it possible to start multiple process.py instances to process files concurrently? I want to do this in a shell script.

CodePudding user response:

It's better to use os.listdir and subprocess.Popen to start new processes.

CodePudding user response:

I have another possibility for you, if still needed. It uses the screen command to create a new detached process with the supplied command.

Here is an example:

#!/bin/bash

data_dir=./all
for file_name in "$data_dir"/*
do
  echo "$file_name"
  screen -dm python process.py "$file_name"
done

CodePudding user response:

With GNU Parallel, like this:

parallel python process.py {} ::: all/*

It will run N jobs in parallel, where N is the number of CPU cores you have, or you can specify -j4 to run on just 4, for example.

Many, many options for:

  • logging,
  • splitting/chunking inputs,
  • tagging/separating output,
  • staggering job starts,
  • massaging input parameters,
  • fail and retry handling,
  • distributing jobs and data to other machines
  • and so on...

Try putting [gnu-parallel] in the StackOverflow search box.

  • Related