I am running a 4 bash scripts in parallel, all 4 scripts are running at the same time:
./script1.sh & ./script2.sh & ./script3.sh & ./script4.sh
I would like to exit once either of them fail. I was trying to use something like an exit code , but it doesn't seem to run for parallel scripts. Is there a workaround? Any bash/python solution would be welcome.
CodePudding user response:
Here is a script that will do it for you.
I borrowed (and modified) non_blocking_wait
function from here.
#!/bin/bash
# Run your scripts here... Following sleep commands as an example
sleep 5 &
sleep 3 &
sleep 3 &
# Here, we get the pid of each running process an put in the array "pids"
pids=( $(jobs -p | tr '\n' ' ') )
echo "pids = ${pids[@]}"
non_blocking_wait()
{
PID=$1
if [ ! -d "/proc/$PID" ]; then
wait $PID
CODE=$?
else
CODE=127
fi
echo $CODE
}
while true; do
# Check if all processes are still running
n_running=$(jobs -l | grep -c "Running")
if [ "${n_running}" -eq "0" ]; then
echo "All processes finished successfully here..."
exit 0
fi
if [ "${n_running}" -ne "3" ]; then
# At least one processes finished/returned here,
# check it exited in error
for pid in ${pids[@]}; do
ret=$(non_blocking_wait ${pid})
echo "non_blocking_wait ${pid} ret = ${ret}"
if [ "${ret}" -ne "0" ] && [ "${ret}" -ne "127" ]; then
echo "Process ${pid} exited with error ${ret}"
# Here we can take any desirable action such as
# killing all children and exiting the program:
kill $(jobs -p) > /dev/null 2>&1
exit 1
fi
done
fi
sleep 1
done
If you simply run it, it will exit 0 when all processes ends:
$ ./script.sh
pids = 32342 32343 32344
non_blocking_wait 32342 ret = 127
non_blocking_wait 32343 ret = 0
non_blocking_wait 32344 ret = 127
non_blocking_wait 32342 ret = 127
non_blocking_wait 32343 ret = 0
non_blocking_wait 32344 ret = 0
non_blocking_wait 32342 ret = 127
non_blocking_wait 32343 ret = 0
non_blocking_wait 32344 ret = 0
All processes finished successfully here...
You can remove the parameter from one of the sleep commands to make it fail and see the program returning immediately:
$ ./script.sh
sleep: missing operand
Try 'sleep --help' for more information.
pids = 32394 32395 32396
non_blocking_wait 32394 ret = 127
non_blocking_wait 32395 ret = 1
Process 32395 exited with error 1
CodePudding user response:
One solution is to use subprocess:
import subprocess
import time
def do_that(scripts):
ps = [subprocess.Popen('./' s, shell=True) for s in scripts]
while True:
done = True
for p in ps:
rc = p.poll()
if rc is None: # Script is still running
done = False
elif rc:
# if rc==0, script success to finish
# otherwise it failed
print('This script run failed:', p.args)
running = set(ps) - {p}
for i in running:
i.terminate()
print('Force terminate', i.args)
return 1
if done:
print('All done.')
return 0
def timeit(func):
def runner(*args, **kwargs):
start = time.time()
res = func(*args, **kwargs)
end = time.time()
print(func.__name__, 'cost:', round(end-start,1))
return res
return runner
@timeit
def main():
scripts = ('script1.sh', 'script2.sh')
do_that(scripts)
if __name__ == '__main__':
main()
CodePudding user response:
TL;DR
parallel --line-buffer --halt now,fail=1 ::: ./script?.sh
Actual answer
When running jobs in parallel, I find it useful to consider GNU Parallel because it makes so many aspects easy for you:
- resource allocation
- load spreading across multiple CPUs and across networks
- logging and output tagging
- error-handling - this aspect is of particular interest here
- scheduling, restarting
- input & output file name derivation and renaming
- progress reporting
So, I have made 4 dummy jobs script1.sh
through script4.sh
like this:
#!/bin/bash
echo "script1.sh starting..."
sleep 5
echo "script1.sh complete"
Except script3.sh
which fails before the others:
#!/bin/bash
echo "script3.sh starting..."
sleep 2
echo "script3.sh dying"
exit 1
So, here's the default way to run 4 jobs in parallel, with the outputs of each all gathered and presented one after the other:
parallel ::: ./script*.sh
script3.sh starting...
script3.sh dying
script1.sh starting...
script1.sh complete
script4.sh starting...
script4.sh complete
script2.sh starting...
script2.sh complete
You can see script3.sh
dies first and all its output is gathered and shown first, followed by the grouped output of the others.
Now let's do it again, but only buffer the output by line rather than waiting for the jobs to finish and gather it on a per-job basis:
parallel --line-buffer ::: ./script*.sh
script1.sh starting...
script2.sh starting...
script3.sh starting...
script4.sh starting...
script3.sh dying
script1.sh complete
script2.sh complete
script4.sh complete
We can clearly see that script3.sh
dies and exits before the others, but they still run to completion.
Now we want GNU Parallel to kill any running jobs the moment any single one dies:
parallel --line-buffer --halt now,fail=1 ::: ./script?.sh
script2.sh starting...
script1.sh starting...
script3.sh starting...
script4.sh starting...
script3.sh dying
parallel: This job failed:
./script3.sh
You can see that script3.sh
died and none of the other jobs completed because GNU Parallel killed them.
It is far more flexible than I have shown. You can change now
to soon
and instead of killing other jobs, it will just not start any new ones. You can change fail=1
to success=50%
so it will stop when half the jobs exit successfully, and so on.
You can also add --eta
or --bar
for progress reports and distribute jobs across your network and so on. Well worth reading up, in these days where CPUs are getting fatter (more cores) rather than taller (more GHz).