Problem: Inspired by this thread, I'm trying to write a wrapper script that submits SLURM array jobs with bash variables. However, I'm running into issues with SLURM environment variables like $SLURM_ARRAY_TASK_ID
as it acts as an empty variable.
I suspect it has something to do with how the test_wrapper.sh
is parsing the yet undefined SLURM variable, but I can't seem to find a solution.
Below I provide a working example with a simple python script that should take an array ID as an input variable, but when it is called by the bash wrapper script, the python script crashes as it receives an empty variable.
test_wrapper.sh
:
#!/bin/bash
for argument in "$@"
do
key=$(echo $argument | cut -f 1 -d'=')
value=$(echo $argument | cut -f 2 -d'=')
case "$key" in
"job_name") job_name="$value" ;;
"cpus") cpus="$value" ;;
"memory") memory="$value" ;;
"time") time="$value" ;;
"array") array="$value" ;;
*)
esac
done
sbatch <<EOT
#!/bin/bash
#SBATCH --account=foobar
#SBATCH --cpus-per-task=${cpus:-1}
#SBATCH --mem-per-cpu=${memory:-1}GB
#SBATCH --time=${time:-00:01:00}
#SBATCH --array=${array:-1-2}
#SBATCH --job-name=${job_name:-Default_Job_Name}
if [ -z "$SLURM_ARRAY_TASK_ID" ]
then
echo "The array ID \$SLURM_ARRAY_TASK_ID is empty"
else
echo "The array ID \$SLURM_ARRAY_TASK_ID is NOT empty"
fi
srun python foo.py -a $SLURM_ARRAY_TASK_ID
echo "Job finished with exit code $?"
EOT
where foo.py
is:
import argparse
def main(args):
print('array number is : {}'.format(args.array_number))
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("-a", "--array_number",
help="the value passed from SLURM_ARRAY_TASK_ID"
)
args = parser.parse_args()
main(args)
$cat slurm-123456789_1.out
yields :
The array ID 1 is empty
usage: foo.py [-h] [-a ARRAY_NUMBER]
foo.py: error: argument -a/--array_number: expected one argument
srun: error: nc10931: task 0: Exited with exit code 2
Job finished with exit code 0
I find it strange, that "The array ID 1 is empty" is correctly printing the $SLURM_ARRAY_TASK_ID
(??)
CodePudding user response:
So according to this page:
Job arrays will have two additional environment variable set. SLURM_ARRAY_JOB_ID will be set to the first job ID of the array. SLURM_ARRAY_TASK_ID will be set to the job array index value.
That suggests to me that sbatch
is supposed to set these for you. In that case, you need to escape all instances of $SLURM_ARRAY_TASK_ID
in the script you pass via the heredoc so that they don't get prematurely substituted before sbatch
can set the relevant environment variable.
The two options for this are:
- If you don't want any expansions to occur at all, quote the heredoc delimiter.
sbatch <<"EOT"
<your script here>
EOT
- If you need some expansions to occur but want to disable others, then escape the ones that should not be expanded by putting a
\
in front of them like you have done in your existing script.
CodePudding user response:
Thanks to the feedback posted in the comments I was able to fix the issue. Posting a "fixed" version of the wrapper script below.
In short, the solution is to escape $SLURM_ARRAY_TASK_ID.
#!/bin/bash
for argument in "$@"
do
key=$(echo $argument | cut -f 1 -d'=')
value=$(echo $argument | cut -f 2 -d'=')
case "$key" in
"job_name") job_name="$value" ;;
"cpus") cpus="$value" ;;
"memory") memory="$value" ;;
"time") time="$value" ;;
"array") array="$value" ;;
*)
esac
done
{ tee /dev/stderr | sbatch; } <<EOT
#!/bin/bash
#SBATCH --account=foobar
#SBATCH --cpus-per-task=${cpus:-1}
#SBATCH --mem-per-cpu=${memory:-1}GB
#SBATCH --time=${time:-00:01:00}
#SBATCH --array=${array:-1-2}
#SBATCH --job-name=${job_name:-Default_Job_Name}
if [ -z "\$SLURM_ARRAY_TASK_ID" ]
then
echo "The array ID \$SLURM_ARRAY_TASK_ID is empty"
else
echo "The array ID \$SLURM_ARRAY_TASK_ID is NOT empty"
fi
python foo.py -a \$SLURM_ARRAY_TASK_ID
EOT
cat slurm-123456789_1.out
yields :
The array ID 1 is NOT empty
array number is : 1
Note: the { tee /dev/stderr | sbatch; }
is not necessary, but is very useful for debugging (thanks Charles Duffy)