I have a script that looks like this:
for x in ...; do
for y in ...; do
# run several commands which depends on x and y and requires a single GPU
# (I also need to specify which GPU to use)
command1 $x $y GPU0
command2 $x $y GPU0
done
done
# Some stuff after the loop
I have 4 GPUs. I want to make the loop parallel. I.e. for the current (x,y)
iteration, I want to wait until some GPU is available, run the commands, and go to the next iteration (without waiting for the current iteration to finish). How do I do this?
I know about flock
command, so I can create a lock file for each GPU and use it to control access to the GPU. But, as I understand, it requires me to know which GPU my current (x,y)
iteration plans to use.
And another concern is how to guarantee that at every iteration I use correct x
and y
. I.e., when we go to the next iteration, x
and y
change, and it must not be reflected in command1 $x $y GPU...
at the previous iteration.
CodePudding user response:
I assume that you're not targeting macOS because you're using GPUs, so here's a bash-4 solution for getting you started (you might want to improve on it by defining functions, traps, etc...):
#!/bin/bash
declare -A avail_gpus=( [GPU0]= [GPU1]= [GPU2]= [GPU3]= )
declare -a procs_gpus=( )
for x in x1 x2 x3
do
for y in y1 y2 y3
do
while :
do
for pid in "${!procs_gpus[@]}"
do
kill -0 "$pid" 2> /dev/null && continue
echo "freed ${procs_gpus[pid]}" # demo
avail_gpus["${procs_gpus[pid]}"]=
unset procs_gpus[pid]
done
(( ${#avail_gpus[@]} > 0 )) && break
sleep .5
done
for gpu in "${!avail_gpus[@]}"; do break; done
{
echo "$x" "$y" "$gpu" # demo
sleep "$((1 $RANDOM % 10))" # demo
#command1 "$x" "$y" "$gpu"
#command2 "$x" "$y" "$gpu"
} &
procs_gpus[$!]=$gpu
unset avail_gpus["$gpu"]
done
done
wait
with the above code you'll get something like:
x1 y1 GPU2
x1 y2 GPU3
x1 y3 GPU0
x2 y1 GPU1
freed GPU0
x2 y2 GPU0
freed GPU2
x2 y3 GPU2
freed GPU3
x3 y1 GPU3
freed GPU0
freed GPU2
x3 y2 GPU2
x3 y3 GPU0
CodePudding user response:
If you have GNU Parallel:
parallel -j4 'echo Do {1} and {2} on GPU {%}; sleep 1.$RANDOM;' ::: a b c ::: X Y Z
To follow the progress use --lb
:
parallel -j4 --lb 'echo Do {1} and {2} on GPU {%}; sleep 1.$RANDOM; echo GPU {%} done' ::: a b c ::: X Y Z