I run multiple serial jobs on HPC. For example, if I have 10 simulations, I use 10 cores on HPC and use each core for a simulation. However, the end time of all these simulations is different and as soon as one simulation completes, all the others stop as well. How do I hold the job script so that even if one simulation is completed, others will keep running, in simple words, job script stays on HPC. An example of my job script:
#!/bin/bash
#SBATCH --job-name=CaseName # name of the job
#SBATCH --ntasks=60 # number of requested cores
#SBATCH --cpus-per-task=1
#SBATCH --time=7-00:00:00 # time limit
#SBATCH --partition=core64 # queue
cd Folder1
for i in {1..5}
do
cd Folder$i
for j in {1..6}
do
cd SubFolder$j
application > log 2>&1 &
cd ..
done
cd ..
done
cd ..
cd LastFolder
application > log 2>&1
Is there any command I can add in job script to do so ?
Any command to use in job script to continue the jobs in hpc after simulation ends.
CodePudding user response:
You need a wait
at the end of your script as you run the jobs in the background and you want exit from the script when all of them finished.
from man bash
:
wait [-fn] [-p varname] [id ...]
Wait for each specified child process and return
its termination status. ...
...
If id is not given, wait waits for all running background jobs...
CodePudding user response:
There's something wrong with your cd
logic.
Perhaps try running the cd
and the application
in a subshell, e.g.
(cd SubFolder$j ; application > log 2>&1 & )
Then, that way, you can be assured that every command run's concurrently and in their own subdirectory without impacting each other.