Home > Net >  Bash wait command ignoring specified process IDs
Bash wait command ignoring specified process IDs

Time:06-10

DIRECTORIES=( group1 group2 group3 group4 group5 )
PIDS=()

function GetFileSpace() {
    shopt -s nullglob
    TARGETS=(/home/${1}/data/*)
    for ITEM in "${TARGETS[@]}"
    do
            # Here we launch du on a user in the background
            # And then add their process id to PIDS
            du -hs $ITEM >> ./${1}_filespace.txt &
            PIDS =($!)
    done
}

# Here I launch function GetFileSpace for each group.
for GROUP in "${DIRECTORIES[@]}"
do
    echo $GROUP
    # Store standard error to collect files with bad permissions
    GetFileSpace $GROUP 2>> ./${GROUP}_permission_denied.txt &
done

for PID in "${PIDS[@]}"
do
    wait $PID
done

echo "Formatting Results..."
# The script will after this, but it isn't relevant.

I am trying to write a script that monitors storage volume and file permissions of individual users across 5 groups.

|_home          # For additional reference to understand my code,
  |_group1      # directories are laid out like this
  | |_data
  |   |_user1
  |   |_user2
  |   |_user3
  |
  |_group2
    |_data
      |_user4
      |_user5

First, I use a loop to iteratively launch a function, GetFileSpace, for each group in DIRECTORIES. This function then runs du -sh for each user found within a group.

To speed up this whole process, I launch each instance of GetFileSpace and the subsequent du -sh sub processes in the background with &. This makes it so everything can run pretty much simultaneously, which takes much less time.

My issue is that after I launch these processes I want my script to wait for every background instance of du -sh to finish before moving on to the next step.

To do this, I have tried to collect process IDs after each task is launched within the array PIDS. Then I try to loop through the array and wait for each PID until all sub-processes finish. Unfortunately this doesn't seem to work. The script correctly launches du -sh for each user, but then immediately tries to move on to the next step, breaking.

My question then, is why does my script not wait for my background tasks to finish and how can I implement this behavior?

As a final note, I have tried several other methods to accomplish this from this SO post, but haven't been able to get them working either.

CodePudding user response:

GetFileSpace ... &

You are running the whole function as a subproces. So it immediately tries to move on to the next step and PID is unset, cause it beeing set in subprocess.

Do not run it in the background.

GetFileSpace ...   # no & on the end.

Notes: Consider using xargs or GNU parallel. Prefer lower case for script local variables. Quote variable expansions. Use shellcheck to check for such errors.

work() {
   tmp=$(du -hs "$2")
   echo "$tmp" >> "./${1}_filespace.txt"
}
export -f work
for i in "${directories[@]}"; do
   printf "$i %s\n" /home/${1}/data/*
done | xargs -n2 -P$(nproc) bash -c 'work "$@"' _

Note that when job is I/O bound, running multiple processes (escpecially without no upper bound) doesn't really help much, if it's on one disc.

  • Related