I am enqueuing a job with sbatch in SLURM, with an script similar to:
#!/bin/bash
#SBATCH -q regular
#SBATCH -t 02:00:00
#SBATCH -N 1
#Some other lines
#Here I iterate thru i < NCONF
while [ i -le NCONF ];
do
env WHATEVER_ENV_VAR=i srun -n 64 myapp arg0 >> output
done
Basically, the loop inside the job is testing NCONF configurations for the env. variable $WHATEVER_ENV_VAR. Everything works well. However, depending on the value of $WHATEVER_ENV_VAR the mpi execution can take minutes or hours, and I am only interested in configurations with short execution times.
I am wondering if it is possible to pass an argument to srun to limit the execution time. For example, if the current srun execution is taking more than 10 minutes, then abort it and continue to the next iteration.
I am reading the srun documentation but I cannot find an option for that. Any idea?
Thanks
CodePudding user response:
From the doc of srun
:
-t, --time=<time>
Set a limit on the total run time of the job allocation.
It should work even while running it inside a SBATCH script