I'm trying to create a bash script that reads a CSV with two columns:
first column = name
second column = URL
and try to download a PDF file from the URL on the second column with a random name with letters and numbers .pdf and change the name using the first column.
The PDF name could be duplicate so if is duplicate I want to add numbers like:
Example $5000.pdf
Example $5000.1.pdf
Example $5000.2.pdf
Because if I try to download wget and curl will not auto-increment with the output option. I tried a lot of things but my limitations are taking too much time.
I created a counter that add the line number to the end, but if I got a larger PDF there will be unnecessary auto-increment numbers. (code below)
There should be a better method, but my lack of knowledge is taking too much time. So any help with that will be really appreciated, I'm a beginner on bash scripts.
Thanks for any help in advance!
CSV example:
Example $5000,HTTP://example.com/djdiede.pdf
Example $5000,HTTP://example.com/djdi42322ede.pdf
Example 0 $1000,HTTP://example.com/djd4234iede.pdf
Example P $1000,HTTP://example.com/dj43566diede.pdf
Code so far:
#!/bin/bash -e
COUNTER=1
while IFS=, read -r field1 field2
do
COUNTER=$[$COUNTER 1]
if [ "$field1" == "" ]
then
echo "Line $COUNTER field1 is empty or no value set"
elif [ "$field2" == "" ]
then
echo "Line $COUNTER field2 is empty or no value set"
else
pdf_file=$(echo $field1 | tr '/' ' ')
echo "================================================"
echo "Downloading $COUNTER $pdf_file..."
echo "================================================"
pdf_file_test="$pdf_file.pdf"
if [ -e "$pdf_file_test" ]; then
echo -e "\033[32m ^^^ File already exists!!! Adding line number at the end of the file: $pdf_file.$COUNTER.pdf \033[0m" >&2
wget -q -nc -O "$pdf_file."$COUNTER.pdf $field2
else
wget -q -nc -O "$pdf_file".pdf $field2
fi
fi
done < test.csv
CodePudding user response:
This should help. I tried to stay close to your own coding style:
#!/bin/bash -e
LINECOUNTER=0
while IFS=, read -r field1 field2
do
LINECOUNTER=$[$LINECOUNTER 1]
if [ "$field1" == "" ]
then
echo "Line $LINECOUNTER: field1 is empty or no value set"
elif [ "$field2" == "" ]
then
echo "Line $LINECOUNTER: field2 is empty or no value set"
else
pdf_file=$(echo "$field1" | tr '/' ' ')
echo "================================================"
echo "Downloading $LINECOUNTER: $pdf_file..."
echo "================================================"
pdf_file_saveas="$pdf_file.pdf"
FILECOUNTER=0
while [ -e "$pdf_file_saveas" ]
do
FILECOUNTER=$[$FILECOUNTER 1]
pdf_file_saveas="$pdf_file.$FILECOUNTER.pdf"
done
if [ $FILECOUNTER -gt 0 ]
then
echo -e "\033[32m ^^^ File already exists!!! Adding number at the end of the file: $pdf_file_saveas \033[0m" >&2
fi
wget -q -nc -O "$pdf_file_saveas" "$field2"
fi
done < test.csv
Here's what I did:
- use two counters: one for lines, one for files
- when a file already exists, use file counter loop to find the next 'empty slot' (i.e. file named
<filename>.<counter-value>.pdf
that does not exist) - fixed wrong line numbers (line counter needs to start at 0 instead of 1)
- added double quotes where necessary/advisable
If you want to improve your script further, here are some suggestions:
- instead of the big
if ... elif ... else
contruct, you can useif
continue
, e.g.if [ "$field1" == "" ]; then continue; fi
or even[ "$field1" == "" ] && continue
- instead of terminating on error (
#!/bin/bash -e
), you could add error detection and handling after thewget
call, e.g.if [ $? -ne 0 ]; then echo "failed to download ..."; fi