Home > Back-end >  parallelize with bash for loop
parallelize with bash for loop

Time:10-27

I have a script that runs the same steps for multiple files like this:

f_list = 'a b c d'
for f in $f_list
do  
    echo "start process 1"
    code to start process 1
    echo "start process 2"
    code to start process 2
    echo "start process 3" #This step gets the input from step 2
    code to start process 3 & #Takes long
    echo "process 3 done for ${f} at `date`"
done

I want to do this: once step 3 starts for one element in the list, iterate over the next element on the list without waiting for step 3 to end (no need to wait), but once step 3 finishes print the time it was finished. I thought adding & at the end but that's not exactly what I want as it would not print the time step 3 finishes.

Thank you

CodePudding user response:

You can wrap the associated lines in a pair of braces and place said association in the background, eg:

for f in $f_list
do  
    echo "start process 1"
    code to start process 1
    echo "start process 2"
    code to start process 2
    echo "start process 3" #This step gets the input from step 2
    { code to start process 3 
      echo "process 3 done for ${f} at $(date)"
    } &
done

If you need to run a large mix of steps as part of 'process 3' you can still use the {} wrapper or you can modularize the step with a function, eg:

process_3() {
    input_file=$1
    do some stuff with "${input_file}"
    do more stuff with "${input_file}"
    do even more stuff with "${input_file}"
    echo "process 3 done for ${input_file} at $(date)"
}

for f in $f_list
do  
    echo "start process 1"
    code to start process 1
    echo "start process 2"
    code to start process 2
    echo "start process 3" #This step gets the input from step 2

    process_3 "${f}" &
done
  • Related