Home > Software design >  Script Getting Stuck While Sorting Files Using Grep
Script Getting Stuck While Sorting Files Using Grep

Time:07-04

So I currently am working on a matching script that does a few things:

  1. Takes a list of keywords
  2. For each individual keyword, look through the directory for grep matches
  3. For each grep match, copy and paste the file into a "Sorted/{keyword}" directory

The functionality seems to be fine with a couple of issues.

  1. When I run the script, it seems to get stuck on the first iteration of the loop until I press Ctrl-C, then it spits out a lot of the console messages I would expect to be receiving throughout the process.
  2. It takes absurdly long to finish (which might be something that there's no way around, but any optimization advice would be greatly appreciated).

Little note, I am using pdfgrep. It seems to be pretty functionally the same, just thought it was worth mentioning.

I'm pretty new to scripting, so please feel free to critique and correct.

Thanks!

#!/bin/bash

# Keyword list
keywords=(
    "Keyword1"
    "Keyword2"
    "Keyword3"
);

mkdir "$HOME/Sorted";
echo "Matching list of keywords/phrases ... (${#keywords[@]}) in length...";
for ((i = 0; i < ${#keywords[@]}; i  ))
do
    echo "Matching ${keywords[$i]}...";
    mkdir "$HOME/Sorted/${keywords[$i]}";
    pdfgrep -lir "${keywords[$i]}" $HOME/PDFs/* | xargs -I{} cp {} -t $HOME/Sorted/"${keywords[$i]}";
done

echo "Finished that matching session... ";
echo "###########################";
echo "Unable to match:"
find $HOME/Sorted/ -type d -empty -printf "%P\n";
find $HOME/Sorted/ -type d -empty -delete;

CodePudding user response:

xargs is probably the culprit; you should add the --no-run-if-empty (aka -r) option and specify the delimiter to be \0 (in conjunction with pdfgrep -lZ):

#!/bin/bash

keywords=(
    "Keyword1"
    "Keyword2"
    "Keyword3"
)

for kw in "${keywords[@]}"
do
    printf 'Matching keyword: %q\n' "$kw"
    folder="$HOME"/Sorted/"$kw"
    mkdir -p "$folder" || exit 1
    pdfgrep -irlZ "$kw" "$HOME"/PDFs/ | xargs -0 -r cp -t "$folder/"
done

echo "Unmatched keywords:"
find "$HOME"/Sorted -mindepth 1 -maxdepth 1 -type d -empty -delete -printf "\t%P\n"

Aside: You could create symbolic or even hard links to the PDF (with ... | xargs -0 -r ln -s -t) instead of copying them; that'll be faster and save disk space.

  • Related