Home > other >  Get exit code from multiple bash scripts running in parallel
Get exit code from multiple bash scripts running in parallel

Time:10-08

I am running a 4 bash scripts in parallel, all 4 scripts are running at the same time:

./script1.sh & ./script2.sh & ./script3.sh & ./script4.sh

I would like to exit once either of them fail. I was trying to use something like an exit code , but it doesn't seem to run for parallel scripts. Is there a workaround? Any bash/python solution would be welcome.

CodePudding user response:

Here is a script that will do it for you. I borrowed (and modified) non_blocking_wait function from here.

#!/bin/bash

# Run your scripts here... Following sleep commands as an example
sleep 5 &
sleep 3 &
sleep 3 &

# Here, we get the pid of each running process an put in the array "pids"
pids=( $(jobs -p | tr '\n' ' ') )

echo "pids = ${pids[@]}"

non_blocking_wait()
{
    PID=$1
    if [ ! -d "/proc/$PID" ]; then
        wait $PID
        CODE=$?
    else
        CODE=127
    fi

    echo $CODE
}

while true; do

    # Check if all processes are still running
    n_running=$(jobs -l | grep -c "Running")

    if [ "${n_running}" -eq "0" ]; then
        echo "All processes finished successfully here..."
        exit 0
    fi

    if [ "${n_running}" -ne "3" ]; then

        # At least one processes finished/returned here,
        # check it exited in error
        for pid in ${pids[@]}; do
            ret=$(non_blocking_wait ${pid})
            echo "non_blocking_wait ${pid} ret = ${ret}"
            if [ "${ret}" -ne "0" ] && [ "${ret}" -ne "127" ]; then
                echo "Process ${pid} exited with error ${ret}"

                # Here we can take any desirable action such as
                # killing all children and exiting the program:
                kill $(jobs -p) > /dev/null 2>&1
                exit 1
            fi
        done
    fi

    sleep 1
done

If you simply run it, it will exit 0 when all processes ends:

$ ./script.sh 
pids = 32342 32343 32344
non_blocking_wait 32342 ret = 127 
non_blocking_wait 32343 ret = 0 
non_blocking_wait 32344 ret = 127 
non_blocking_wait 32342 ret = 127 
non_blocking_wait 32343 ret = 0 
non_blocking_wait 32344 ret = 0 
non_blocking_wait 32342 ret = 127 
non_blocking_wait 32343 ret = 0 
non_blocking_wait 32344 ret = 0 
All processes finished successfully here...

You can remove the parameter from one of the sleep commands to make it fail and see the program returning immediately:

$ ./script.sh 
sleep: missing operand
Try 'sleep --help' for more information.
pids = 32394 32395 32396
non_blocking_wait 32394 ret = 127 
non_blocking_wait 32395 ret = 1 
Process 32395 exited with error 1

CodePudding user response:

One solution is to use subprocess:

import subprocess
import time


def do_that(scripts):
    ps = [subprocess.Popen('./' s, shell=True) for s in scripts]
    while True:
        done = True
        for p in ps:
            rc = p.poll()
            if rc is None:  # Script is still running
                done = False
            elif rc:
                # if rc==0, script success to finish
                # otherwise it failed
                print('This script run failed:', p.args)
                running = set(ps) - {p}
                for i in running:
                    i.terminate()
                    print('Force terminate', i.args)
                return 1
        if done:
            print('All done.')
            return 0


def timeit(func):
    def runner(*args, **kwargs):
        start = time.time()
        res = func(*args, **kwargs)
        end = time.time()
        print(func.__name__, 'cost:', round(end-start,1))
        return res
    return runner


@timeit
def main():
    scripts = ('script1.sh', 'script2.sh')
    do_that(scripts)


if __name__ == '__main__':
    main()

CodePudding user response:

TL;DR

parallel --line-buffer --halt now,fail=1 ::: ./script?.sh

Actual answer

When running jobs in parallel, I find it useful to consider GNU Parallel because it makes so many aspects easy for you:

  • resource allocation
  • load spreading across multiple CPUs and across networks
  • logging and output tagging
  • error-handling - this aspect is of particular interest here
  • scheduling, restarting
  • input & output file name derivation and renaming
  • progress reporting

So, I have made 4 dummy jobs script1.sh through script4.sh like this:

#!/bin/bash
echo "script1.sh starting..."
sleep 5
echo "script1.sh complete"

Except script3.sh which fails before the others:

#!/bin/bash
echo "script3.sh starting..."
sleep 2
echo "script3.sh dying"
exit 1

So, here's the default way to run 4 jobs in parallel, with the outputs of each all gathered and presented one after the other:

parallel ::: ./script*.sh
script3.sh starting...
script3.sh dying
script1.sh starting...
script1.sh complete
script4.sh starting...
script4.sh complete
script2.sh starting...
script2.sh complete

You can see script3.sh dies first and all its output is gathered and shown first, followed by the grouped output of the others.


Now let's do it again, but only buffer the output by line rather than waiting for the jobs to finish and gather it on a per-job basis:

parallel --line-buffer ::: ./script*.sh 
script1.sh starting...
script2.sh starting...
script3.sh starting...
script4.sh starting...
script3.sh dying
script1.sh complete
script2.sh complete
script4.sh complete

We can clearly see that script3.sh dies and exits before the others, but they still run to completion.


Now we want GNU Parallel to kill any running jobs the moment any single one dies:

parallel --line-buffer --halt now,fail=1 ::: ./script?.sh
script2.sh starting...
script1.sh starting...
script3.sh starting...
script4.sh starting...
script3.sh dying
parallel: This job failed:
./script3.sh

You can see that script3.sh died and none of the other jobs completed because GNU Parallel killed them.

It is far more flexible than I have shown. You can change now to soon and instead of killing other jobs, it will just not start any new ones. You can change fail=1 to success=50% so it will stop when half the jobs exit successfully, and so on.

You can also add --eta or --bar for progress reports and distribute jobs across your network and so on. Well worth reading up, in these days where CPUs are getting fatter (more cores) rather than taller (more GHz).

  • Related