I have been attempting to extract a CSV file full of URL's of images (about 1000). Each row is a specific product with the first cell labelled "id". I have taken the ID of each line in excel and created directories for them using a loop with mkdir.
My issue now is that I can't seem to figure out how to download the image, and then immediately store it into these folder's.
What I am attempting here is to use wget by concatenating "fold_name" and "EXT" to get it like a directory "/name_of_folder", and then getting the links to the images (in cell 5,6,7 and 8) and then using wget from these cells, into the directory.
Can anyone assist me with this? I think this should be straight forward enough.
Thank you!
#!/usr/bin/bash
EXT='/'
while read line
do
fold_name= cut -d$',' -f1
concat= "%EXT" "%fold_name"
img1= cut -d$',' -f5
img2= cut -d$',' -f6
img3= cut -d$',' -f7
img4= cut -d$',' -f8
wget -O "%img1" "%concat"
wget -O "%img2" "%concat"
wget -O "%img1" "%concat"
wget -O "%img2" "%concat"
done < file.csv
CodePudding user response:
You might use -P
switch to designate target directory, consider following simple example using some files from test-images/png repository
mkdir -p black
mkdir -p gray
mkdir -p white
wget -P black https://raw.githubusercontent.com/test-images/png/main/202105/cs-black-000.png
wget -P gray https://raw.githubusercontent.com/test-images/png/main/202105/cs-gray-7f7f7f.png
wget -P white https://raw.githubusercontent.com/test-images/png/main/202105/cs-white-fff.png
will lead to following structure
black
cs-black-000.png
gray
cs-gray-7f7f7f.png
white
cs-white-fff.png
CodePudding user response:
You should use variables names that are less ambiguous.
You need to provide the directory as part of the output filename.
"%" is not a bash variable designator. That is a formatting directive (for bash, awk, C, etc.).
The following will provide what you want.
#!/usr/bin/bash
DBG=1
INPUT="${1}"
INPUT="file.csv"
cat >"${INPUT}" <<"EnDoFiNpUt"
#topic_1,junk01,junk02,junk03,img_101.png,img_102.png,img_103.png,img_104.png
#topic_2,junk04,junk05,junk06,img_201.png,img_202.png,img_203.png,img_204.png
#
topic_1,junk01,junk02,junk03,https://raw.githubusercontent.com/test-images/png/main/202105/cs-black-000.png,https://raw.githubusercontent.com/test-images/png/main/202105/cs-gray-7f7f7f.png,https://raw.githubusercontent.com/test-images/png/main/202105/cs-white-fff.png
EnDoFiNpUt
if [ ${DBG} -eq 1 ]
then
echo -e "\n Input file:"
cat "${INPUT}" | awk '{ printf("\t %s\n", $0 ) ; }'
echo -e "\n Hit return to continue ..." ; read k
fi
REPO_ROOT='/tmp'
grep -v '^#' "${INPUT}" |
while read line
do
topic_name=$(echo "${line}" | cut -f1 -d\, )
test ${DBG} -eq 1 && echo -e "\t topic_name= ${topic_name} ..."
folder="${REPO_ROOT}/${topic_name}"
test ${DBG} -eq 1 && echo -e "\t folder= ${folder} ..."
if [ ! -d "${folder}" ]
then
mkdir "${folder}"
else
rm -f "${folder}/"*
fi
if [ ! -d "${folder}" ]
then
echo -e "\n Unable to create directory '${folder}' for saving downloads.\n Bypassing 'wget' actions ..." >&2
else
test ${DBG} -eq 1 && ls -ld "${folder}" | awk '{ printf("\n\t %s\n", $0 ) ; }'
url1=$(echo "${line}" | cut -d\, -f5 )
url2=$(echo "${line}" | cut -d\, -f6 )
url3=$(echo "${line}" | cut -d\, -f7 )
url4=$(echo "${line}" | cut -d\, -f8 )
test ${DBG} -eq 1 && {
echo -e "\n URLs extracted:"
echo -e "\n\t ${url1}\n\t ${url2}\n\t ${url3}\n\t ${url4}"
}
#imageFile1=$( basename "${url1}" | sed 's ^img_ yourImagePrefix_ ' )
#imageFile2=$( basename "${url2}" | sed 's ^img_ yourImagePrefix_ ' )
#imageFile3=$( basename "${url3}" | sed 's ^img_ yourImagePrefix_ ' )
#imageFile4=$( basename "${url4}" | sed 's ^img_ yourImagePrefix_ ' )
imageFile1=$( basename "${url1}" | sed 's ^cs- yourImagePrefix_ ' )
imageFile2=$( basename "${url2}" | sed 's ^cs- yourImagePrefix_ ' )
imageFile3=$( basename "${url3}" | sed 's ^cs- yourImagePrefix_ ' )
test ${DBG} -eq 1 && {
echo -e "\n Image filenames assigned:"
#echo -e "\n\t ${imageFile1}\n\t ${imageFile2}\n\t ${imageFile3}\n\t ${imageFile4}"
echo -e "\n\t ${imageFile1}\n\t ${imageFile2}\n\t ${imageFile3}"
}
test ${DBG} -eq 1 && {
echo -e "\n WGET process log:"
}
### This form of wget does NOT work for me, although man page says it should.
#wget -P "${folder}" -O "${imageFile1}" "${url1}"
### This form of wget DOES work for me
wget -O "${folder}/${imageFile1}" "${url1}"
wget -O "${folder}/${imageFile2}" "${url2}"
wget -O "${folder}/${imageFile3}" "${url3}"
#wget -O "${folder}/${imageFile3}" "${url3}"
test ${DBG} -eq 1 && {
echo -e "\n Listing of downloaded files:"
ls -l /tmp/topic* 2>>/dev/null | awk '{ printf("\t %s\n", $0 ) ; }'
}
fi
done
The script is adapted for what I had to work with. :-)