Home > Back-end >  Running different tasks on individual resource sets within same node
Running different tasks on individual resource sets within same node

Time:09-10

I asked about an issue I had with this using a different approach (Having issues running mpi4py on large HPC system. Receving startup errors and sometimes variable errors), however I'm currently attempting two other approaches. With no success. All examples below still put the same task on each of the six resource sets.

Background: I'm attempting to distribute predictions across resource sets on a node. Each resource set contains 1 gpu and 7 cpus and there are six sets per node. Once a RS task completes, it should move on to the next prediction on in a list (part00.lst through part05.lst; in theory one per RS)

First approach looks something like this (a submission bash script calls this using jsrun -r6 -g1 -a1 -c7 -b packed:7 -d packed -l gpu-cpu):

#!/bin/bash
output=/path/  ##where completed predictions will be collected

for i in {0..5}; do
  target=part0${i}.lst
  ........     ##the singularity job script to execute using $target and $output variables
done

The next attempt is using simultaneous jobs steps via UNIX backgrounding (which others have been able to appropriate to do similar things that I wish to do, but for different jobs and tasks). Here I created six separate bash files with each corresponding input file ($target aka part00.lst through part05.lst):

#!/bin/bash

## Various submission flags here

for i in {0..5}; do
  jsrun -r 6 -g 1 -a 1 -c 7 -brs -d packed -l gpu-cpu bash batch_run_0${i}.sh &
done
wait

I also attempted just hardcoding the six separate bash files:

#!/bin/bash

jsrun -r 6 -g 1 -a 1 -c 7 -brs -d packed -l gpu-cpu bash batch_run_00.sh &
jsrun -r 6 -g 1 -a 1 -c 7 -brs -d packed -l gpu-cpu bash batch_run_01.sh &
jsrun -r 6 -g 1 -a 1 -c 7 -brs -d packed -l gpu-cpu bash batch_run_02.sh &
jsrun -r 6 -g 1 -a 1 -c 7 -brs -d packed -l gpu-cpu bash batch_run_03.sh &
jsrun -r 6 -g 1 -a 1 -c 7 -brs -d packed -l gpu-cpu bash batch_run_04.sh &
jsrun -r 6 -g 1 -a 1 -c 7 -brs -d packed -l gpu-cpu bash batch_run_05.sh &
wait

Thanks for any help! I'm still quite new to all of this!

CodePudding user response:

Okay, attempt number two using simultaneous job steps/UNIX process backgrounding was nearly correct!

It now works. An example for one node:

Submission script:

#!/bin/bash

## Various submission flags here

for i in {0..5}; do
  jsrun -r 6 -g 1 -a 1 -c 7 -brs -d packed -l gpu-cpu bash batch_run_0${i}.sh &
done
wait

It was only a matter of incorrect flags (-n 1 -r 1, not -n 1 -r 6).

  • Related