Home > Software engineering >  Bash script to download PDF using a CSV with name and url and auto-increment name
Bash script to download PDF using a CSV with name and url and auto-increment name

Time:10-17

I'm trying to create a bash script that reads a CSV with two columns:

first column = name
second column = URL

and try to download a PDF file from the URL on the second column with a random name with letters and numbers .pdf and change the name using the first column.

The PDF name could be duplicate so if is duplicate I want to add numbers like:

Example   $5000.pdf
Example   $5000.1.pdf
Example   $5000.2.pdf

Because if I try to download wget and curl will not auto-increment with the output option. I tried a lot of things but my limitations are taking too much time.

I created a counter that add the line number to the end, but if I got a larger PDF there will be unnecessary auto-increment numbers. (code below)

There should be a better method, but my lack of knowledge is taking too much time. So any help with that will be really appreciated, I'm a beginner on bash scripts.

Thanks for any help in advance!

CSV example:

Example   $5000,HTTP://example.com/djdiede.pdf
Example   $5000,HTTP://example.com/djdi42322ede.pdf
Example 0 $1000,HTTP://example.com/djd4234iede.pdf
Example P $1000,HTTP://example.com/dj43566diede.pdf

Code so far:

#!/bin/bash -e
COUNTER=1
while IFS=, read -r field1 field2
do
    COUNTER=$[$COUNTER  1]
    if [ "$field1" == "" ]
    then
        echo "Line $COUNTER field1 is empty or no value set"
    elif [ "$field2" == "" ]
    then
        echo "Line $COUNTER field2 is empty or no value set"
    else
        pdf_file=$(echo $field1 | tr '/' ' ')
        echo "================================================"
        echo "Downloading $COUNTER $pdf_file..."
        echo "================================================"
        pdf_file_test="$pdf_file.pdf"
        if [ -e "$pdf_file_test" ]; then
            echo -e "\033[32m ^^^ File already exists!!! Adding line number at the end of the file: $pdf_file.$COUNTER.pdf \033[0m" >&2
            wget -q -nc -O "$pdf_file."$COUNTER.pdf $field2
        else
            wget -q -nc -O "$pdf_file".pdf $field2
        fi
    fi
done < test.csv

CodePudding user response:

This should help. I tried to stay close to your own coding style:

#!/bin/bash -e
LINECOUNTER=0
while IFS=, read -r field1 field2
do
    LINECOUNTER=$[$LINECOUNTER  1]
    if [ "$field1" == "" ]
    then
        echo "Line $LINECOUNTER: field1 is empty or no value set"
    elif [ "$field2" == "" ]
    then
        echo "Line $LINECOUNTER: field2 is empty or no value set"
    else
        pdf_file=$(echo "$field1" | tr '/' ' ')
        echo "================================================"
        echo "Downloading $LINECOUNTER: $pdf_file..."
        echo "================================================"
        pdf_file_saveas="$pdf_file.pdf"
        FILECOUNTER=0
        while [ -e "$pdf_file_saveas" ]
        do
            FILECOUNTER=$[$FILECOUNTER  1]
            pdf_file_saveas="$pdf_file.$FILECOUNTER.pdf"
        done
        if [ $FILECOUNTER -gt 0 ]
        then
            echo -e "\033[32m ^^^ File already exists!!! Adding number at the end of the file: $pdf_file_saveas \033[0m" >&2
        fi
        wget -q -nc -O "$pdf_file_saveas" "$field2"
    fi
done < test.csv

Here's what I did:

  • use two counters: one for lines, one for files
  • when a file already exists, use file counter loop to find the next 'empty slot' (i.e. file named <filename>.<counter-value>.pdf that does not exist)
  • fixed wrong line numbers (line counter needs to start at 0 instead of 1)
  • added double quotes where necessary/advisable

If you want to improve your script further, here are some suggestions:

  • instead of the big if ... elif ... else contruct, you can use if continue, e.g. if [ "$field1" == "" ]; then continue; fi or even [ "$field1" == "" ] && continue
  • instead of terminating on error (#!/bin/bash -e), you could add error detection and handling after the wget call, e.g. if [ $? -ne 0 ]; then echo "failed to download ..."; fi
  • Related