I created this snippet to loop through a folder and check if there is an invalid gz file and fix it by gzipping it again. This works fine but only if there are only a couple of files. If there are thousands of files, this takes so long.
Is there a more optimized way to do this.
fix_corrupt_files()
{
dir=$1
for f in $dir/*.gz
do
if gzip -t $f;
then :
else
log "$(basename $f) is corrupt"
base="$(basename $f .gz)"
log "fixing file"
mv $f $dir/$base
gzip $dir/$base
log "file fixed"
fi
done
}
CodePudding user response:
This should give you a little speed up:
fix_corrupt_files()
{
dir="$1"
for f in "$dir"/*.gz
do
{
if gzip -t "$f";
then :
else
log "$(basename "$f") is corrupt"
base="$(basename "$f" .gz)"
log "fixing file"
mv "$f" "$dir/$base"
gzip "$dir/$base" & # run in background
log "file fixed"
fi
} &
done
wait # wait for all background processes to terminate
}
Note that I'm assuming the gzip
commands are your slow parts.
All I really did here was run your if
statement in the background (with {...}&
). So basically each if
statement in your function is going to run in parallel. There is a wait
at the end of the function so it won't leave the function until all sub-processes complete. That may or may not fit your use case. Also be aware that log
is going to get called essentially randomly and possibly out of order. Again, it really depends on your use case whether that matters.
Also note that I added double quotes where ever they should be. Looks like you're are confident that there are not spaces in your file names, but it was giving me anxiety.
edit:
Also also note that this may bring your machine to its knees. I'm not familiar enough with gzip
to know how resource intensive it is. I also don't know how big your archives are. If that becomes a problem, you could add a loop counter that calls wait
every X number of iterations.