Home > database >  How can I limit the execution time of srun within a SLURM job?
How can I limit the execution time of srun within a SLURM job?

Time:08-01

I am enqueuing a job with sbatch in SLURM, with an script similar to:

#!/bin/bash

#SBATCH -q regular
#SBATCH -t 02:00:00
#SBATCH -N 1  

#Some other lines

#Here I iterate thru  i < NCONF 
while [ i -le NCONF ];
do
env WHATEVER_ENV_VAR=i srun -n 64 myapp arg0 >> output
done

Basically, the loop inside the job is testing NCONF configurations for the env. variable $WHATEVER_ENV_VAR. Everything works well. However, depending on the value of $WHATEVER_ENV_VAR the mpi execution can take minutes or hours, and I am only interested in configurations with short execution times.

I am wondering if it is possible to pass an argument to srun to limit the execution time. For example, if the current srun execution is taking more than 10 minutes, then abort it and continue to the next iteration.

I am reading the srun documentation but I cannot find an option for that. Any idea?

Thanks

CodePudding user response:

From the doc of srun:

-t, --time=<time>
Set a limit on the total run time of the job allocation.

It should work even while running it inside a SBATCH script

  • Related