So I currently am working on a matching script that does a few things:
- Takes a list of keywords
- For each individual keyword, look through the directory for grep matches
- For each grep match, copy and paste the file into a "Sorted/{keyword}" directory
The functionality seems to be fine with a couple of issues.
- When I run the script, it seems to get stuck on the first iteration of the loop until I press Ctrl-C, then it spits out a lot of the console messages I would expect to be receiving throughout the process.
- It takes absurdly long to finish (which might be something that there's no way around, but any optimization advice would be greatly appreciated).
Little note, I am using pdfgrep. It seems to be pretty functionally the same, just thought it was worth mentioning.
I'm pretty new to scripting, so please feel free to critique and correct.
Thanks!
#!/bin/bash
# Keyword list
keywords=(
"Keyword1"
"Keyword2"
"Keyword3"
);
mkdir "$HOME/Sorted";
echo "Matching list of keywords/phrases ... (${#keywords[@]}) in length...";
for ((i = 0; i < ${#keywords[@]}; i ))
do
echo "Matching ${keywords[$i]}...";
mkdir "$HOME/Sorted/${keywords[$i]}";
pdfgrep -lir "${keywords[$i]}" $HOME/PDFs/* | xargs -I{} cp {} -t $HOME/Sorted/"${keywords[$i]}";
done
echo "Finished that matching session... ";
echo "###########################";
echo "Unable to match:"
find $HOME/Sorted/ -type d -empty -printf "%P\n";
find $HOME/Sorted/ -type d -empty -delete;
CodePudding user response:
xargs
is probably the culprit; you should add the --no-run-if-empty
(aka -r
) option and specify the delimiter to be \0
(in conjunction with pdfgrep -lZ
):
#!/bin/bash
keywords=(
"Keyword1"
"Keyword2"
"Keyword3"
)
for kw in "${keywords[@]}"
do
printf 'Matching keyword: %q\n' "$kw"
folder="$HOME"/Sorted/"$kw"
mkdir -p "$folder" || exit 1
pdfgrep -irlZ "$kw" "$HOME"/PDFs/ | xargs -0 -r cp -t "$folder/"
done
echo "Unmatched keywords:"
find "$HOME"/Sorted -mindepth 1 -maxdepth 1 -type d -empty -delete -printf "\t%P\n"
Aside: You could create symbolic or even hard links to the PDF (with ... | xargs -0 -r ln -s -t
) instead of copying them; that'll be faster and save disk space.