Home > Software design >  How can I use wget to download specific files in a CSV file, and then store those files into specifi
How can I use wget to download specific files in a CSV file, and then store those files into specifi

Time:12-14

I have been attempting to extract a CSV file full of URL's of images (about 1000). Each row is a specific product with the first cell labelled "id". I have taken the ID of each line in excel and created directories for them using a loop with mkdir.

My issue now is that I can't seem to figure out how to download the image, and then immediately store it into these folder's.

What I am attempting here is to use wget by concatenating "fold_name" and "EXT" to get it like a directory "/name_of_folder", and then getting the links to the images (in cell 5,6,7 and 8) and then using wget from these cells, into the directory.

Can anyone assist me with this? I think this should be straight forward enough.

Thank you!

#!/usr/bin/bash

EXT='/'
while read line
do

  fold_name= cut -d$',' -f1
  concat= "%EXT"   "%fold_name"

  img1= cut -d$',' -f5
  img2= cut -d$',' -f6
  img3= cut -d$',' -f7
  img4= cut -d$',' -f8

  wget -O "%img1" "%concat"
  wget -O "%img2" "%concat"
  wget -O "%img1" "%concat"
  wget -O "%img2" "%concat"
done < file.csv

CodePudding user response:

You might use -P switch to designate target directory, consider following simple example using some files from test-images/png repository

mkdir -p black
mkdir -p gray
mkdir -p white
wget -P black https://raw.githubusercontent.com/test-images/png/main/202105/cs-black-000.png
wget -P gray https://raw.githubusercontent.com/test-images/png/main/202105/cs-gray-7f7f7f.png
wget -P white https://raw.githubusercontent.com/test-images/png/main/202105/cs-white-fff.png

will lead to following structure

black
    cs-black-000.png
gray
    cs-gray-7f7f7f.png
white
    cs-white-fff.png

CodePudding user response:

You should use variables names that are less ambiguous.

You need to provide the directory as part of the output filename.

"%" is not a bash variable designator. That is a formatting directive (for bash, awk, C, etc.).

The following will provide what you want.

#!/usr/bin/bash

DBG=1

INPUT="${1}"
INPUT="file.csv"


cat >"${INPUT}" <<"EnDoFiNpUt"
#topic_1,junk01,junk02,junk03,img_101.png,img_102.png,img_103.png,img_104.png
#topic_2,junk04,junk05,junk06,img_201.png,img_202.png,img_203.png,img_204.png
#
topic_1,junk01,junk02,junk03,https://raw.githubusercontent.com/test-images/png/main/202105/cs-black-000.png,https://raw.githubusercontent.com/test-images/png/main/202105/cs-gray-7f7f7f.png,https://raw.githubusercontent.com/test-images/png/main/202105/cs-white-fff.png
EnDoFiNpUt

if [ ${DBG} -eq 1 ]
then
    echo -e "\n Input file:"
    cat "${INPUT}" | awk '{ printf("\t %s\n", $0 ) ; }'
    echo -e "\n Hit return to continue ..." ; read k
fi

REPO_ROOT='/tmp'

grep -v '^#' "${INPUT}" |
while read line
do
    topic_name=$(echo "${line}" | cut -f1 -d\, )
    test ${DBG} -eq 1 && echo -e "\t topic_name= ${topic_name} ..."

    folder="${REPO_ROOT}/${topic_name}"
    test ${DBG} -eq 1 && echo -e "\t folder= ${folder} ..."

    if [ ! -d "${folder}" ]
    then
        mkdir "${folder}"
    else
        rm -f "${folder}/"*
    fi

    if [ ! -d "${folder}" ]
    then
        echo -e "\n Unable to create directory '${folder}' for saving downloads.\n Bypassing 'wget' actions ..." >&2 
    else
        test ${DBG} -eq 1 && ls -ld "${folder}" | awk '{ printf("\n\t %s\n", $0 ) ; }'

        url1=$(echo "${line}" | cut -d\, -f5 )
        url2=$(echo "${line}" | cut -d\, -f6 )
        url3=$(echo "${line}" | cut -d\, -f7 )
        url4=$(echo "${line}" | cut -d\, -f8 )

        test ${DBG} -eq 1 && {
            echo -e "\n URLs extracted:"
            echo -e "\n\t ${url1}\n\t ${url2}\n\t ${url3}\n\t ${url4}"
        }

        #imageFile1=$( basename "${url1}" | sed 's ^img_ yourImagePrefix_ ' )
        #imageFile2=$( basename "${url2}" | sed 's ^img_ yourImagePrefix_ ' )
        #imageFile3=$( basename "${url3}" | sed 's ^img_ yourImagePrefix_ ' )
        #imageFile4=$( basename "${url4}" | sed 's ^img_ yourImagePrefix_ ' )

        imageFile1=$( basename "${url1}" | sed 's ^cs- yourImagePrefix_ ' )
        imageFile2=$( basename "${url2}" | sed 's ^cs- yourImagePrefix_ ' )
        imageFile3=$( basename "${url3}" | sed 's ^cs- yourImagePrefix_ ' )

        test ${DBG} -eq 1 && {
            echo -e "\n Image filenames assigned:"
            #echo -e "\n\t ${imageFile1}\n\t ${imageFile2}\n\t ${imageFile3}\n\t ${imageFile4}"
            echo -e "\n\t ${imageFile1}\n\t ${imageFile2}\n\t ${imageFile3}"
        }


        test ${DBG} -eq 1 && {
            echo -e "\n WGET process log:"
        }

        ### This form of wget does NOT work for me, although man page says it should.
        #wget -P "${folder}" -O "${imageFile1}" "${url1}"

        ### This form of wget DOES work for me
        wget -O "${folder}/${imageFile1}" "${url1}"
        wget -O "${folder}/${imageFile2}" "${url2}"
        wget -O "${folder}/${imageFile3}" "${url3}"
        #wget -O "${folder}/${imageFile3}" "${url3}"

        test ${DBG} -eq 1 && {
            echo -e "\n Listing of downloaded files:"
            ls -l /tmp/topic* 2>>/dev/null | awk '{ printf("\t %s\n", $0 ) ; }'
        }
    fi
done

The script is adapted for what I had to work with. :-)

  • Related