Home > front end >  Handling bash system variables and slurm environmental variables in a wrapper script
Handling bash system variables and slurm environmental variables in a wrapper script

Time:09-17

Problem: Inspired by this thread, I'm trying to write a wrapper script that submits SLURM array jobs with bash variables. However, I'm running into issues with SLURM environment variables like $SLURM_ARRAY_TASK_ID as it acts as an empty variable.

I suspect it has something to do with how the test_wrapper.sh is parsing the yet undefined SLURM variable, but I can't seem to find a solution.

Below I provide a working example with a simple python script that should take an array ID as an input variable, but when it is called by the bash wrapper script, the python script crashes as it receives an empty variable.

test_wrapper.sh :

#!/bin/bash
for argument in "$@"
do
  key=$(echo $argument | cut -f 1 -d'=')
  value=$(echo $argument | cut -f 2 -d'=')
  case "$key" in
    "job_name")     job_name="$value" ;;
    "cpus")         cpus="$value" ;;
    "memory")       memory="$value" ;;
    "time")         time="$value" ;;
    "array")        array="$value" ;;
    *)
  esac
done

sbatch <<EOT
#!/bin/bash
#SBATCH --account=foobar
#SBATCH --cpus-per-task=${cpus:-1}
#SBATCH --mem-per-cpu=${memory:-1}GB
#SBATCH --time=${time:-00:01:00}
#SBATCH --array=${array:-1-2}
#SBATCH --job-name=${job_name:-Default_Job_Name}

if [ -z "$SLURM_ARRAY_TASK_ID" ]
then
      echo "The array ID \$SLURM_ARRAY_TASK_ID is empty"
else
      echo "The array ID \$SLURM_ARRAY_TASK_ID is NOT empty"
fi

srun python foo.py -a $SLURM_ARRAY_TASK_ID

echo "Job finished with exit code $?"

EOT

where foo.py is:

import argparse

def main(args):
  print('array number is : {}'.format(args.array_number))

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-a", "--array_number",
        help="the value passed from SLURM_ARRAY_TASK_ID"
        )
    args = parser.parse_args()
    main(args)

$cat slurm-123456789_1.out yields :

The array ID 1 is empty
usage: foo.py [-h] [-a ARRAY_NUMBER]
foo.py: error: argument -a/--array_number: expected one argument
srun: error: nc10931: task 0: Exited with exit code 2
Job finished with exit code 0

I find it strange, that "The array ID 1 is empty" is correctly printing the $SLURM_ARRAY_TASK_ID (??)

CodePudding user response:

So according to this page:

Job arrays will have two additional environment variable set. SLURM_ARRAY_JOB_ID will be set to the first job ID of the array. SLURM_ARRAY_TASK_ID will be set to the job array index value.

That suggests to me that sbatch is supposed to set these for you. In that case, you need to escape all instances of $SLURM_ARRAY_TASK_ID in the script you pass via the heredoc so that they don't get prematurely substituted before sbatch can set the relevant environment variable.

The two options for this are:

  1. If you don't want any expansions to occur at all, quote the heredoc delimiter.
sbatch <<"EOT"
<your script here>
EOT
  1. If you need some expansions to occur but want to disable others, then escape the ones that should not be expanded by putting a \ in front of them like you have done in your existing script.

CodePudding user response:

Thanks to the feedback posted in the comments I was able to fix the issue. Posting a "fixed" version of the wrapper script below.

In short, the solution is to escape $SLURM_ARRAY_TASK_ID.

#!/bin/bash
for argument in "$@"
do
  key=$(echo $argument | cut -f 1 -d'=')
  value=$(echo $argument | cut -f 2 -d'=')
  case "$key" in
    "job_name")     job_name="$value" ;;
    "cpus")         cpus="$value" ;;
    "memory")       memory="$value" ;;
    "time")         time="$value" ;;
    "array")        array="$value" ;;
    *)
  esac
done

{ tee /dev/stderr | sbatch; } <<EOT
#!/bin/bash
#SBATCH --account=foobar
#SBATCH --cpus-per-task=${cpus:-1}
#SBATCH --mem-per-cpu=${memory:-1}GB
#SBATCH --time=${time:-00:01:00}
#SBATCH --array=${array:-1-2}
#SBATCH --job-name=${job_name:-Default_Job_Name}

if [ -z "\$SLURM_ARRAY_TASK_ID" ]
then
      echo "The array ID \$SLURM_ARRAY_TASK_ID is empty"
else
      echo "The array ID \$SLURM_ARRAY_TASK_ID is NOT empty"
fi

python foo.py -a \$SLURM_ARRAY_TASK_ID

EOT

cat slurm-123456789_1.out yields :

The array ID 1 is NOT empty
array number is : 1

Note: the { tee /dev/stderr | sbatch; } is not necessary, but is very useful for debugging (thanks Charles Duffy)

  • Related