Home > Back-end >  Loop through two columns in csv to scrape images using bulk-bing-image-downloader in bash
Loop through two columns in csv to scrape images using bulk-bing-image-downloader in bash

Time:09-28

I'm trying to scrape bing images using bulk-bing-image-downloader. I have a csv file that contains keywords and folder names in which I want the images to be saved:

keyword,folder,search
dog's house,animal,1
book.end,read,0
key chains,house,1

I'd like to use the values under keyword and folder as arguments to search and download images, and the value under search as a conditional statement, where if it is 1, then the code performs the search, but not if it is 0. The basic bulk-bing-image-downloader code is:

./bbid.py -s "keyword" --limit 10 --adult-filter-off -o "folder"

where keyword and folder is where I'd like to loop through each row in the csv file. I currently have the bash command set up as, but I'm super new to shell commands and have zero idea how the awk works..help please?:

awk '
BEGIN {
    -F,
    FPAT = "([^,] )|(\"[^\"] \")"
}
{
  if ($1 != "keyword") {
    printf("%s\n", $1)
    ./bbid.py -s $1 --limit 10 --adult-filter-off -o $1
  }
}
' test.csv

CodePudding user response:

Since you mentioned you have zero idea how awk works - get the book "Effective AWK Programming", 5th Edition, by Arnold Robbins and it will teach you how to use AWK. The most important thing for you to understand given the command you posted, though, is this: awk is not shell. Awk and shell are 2 completely different tools with completely different purposes and their own syntax, semantics, and scope. Awk is a tool for manipulating text while shell is a tool for creating/destroying files and processes and sequencing calls to tools. Awk is the tool that the people who invented shell also invented for shell to call when necessary to manipulate text.

This shell script might be what you're trying to do:

while IFS=',' read -r k f _; do
    echo ./bbid.py -s "$k" --limit 10 --adult-filter-off -o "$f"
done < <(tail -n  2 file)
./bbid.py -s dog's house --limit 10 --adult-filter-off -o animal
./bbid.py -s book.end --limit 10 --adult-filter-off -o read
./bbid.py -s key chains --limit 10 --adult-filter-off -o house

Remove the echo when you're done with initial testing.

CodePudding user response:

Your csv file doesn't seem to use quoting mechanisms like "…", so you don't need GNU awk's FPAT. Simply splitting at , seems sufficient.

In that case, a very bash-y way to do this would be

tail -n  2 test.csv | # exclude header line
grep ',1$' |          # include only lines that have search=1
cut -d, -f1-2 |       # from those lines, select keyword and folder
xargs -d, -I_ ./bbid.py -s _ --limit 10 --adult-filter-off -o _

Of course you can stick to awk too, which is a scripting language different from bash. The following script assumes that keyword and folder do not contain special symbols like ", \, or $.

awk -F, 'NR>1 && $3==1 {system("./bbid.py -s \""$1"\" --limit 10 --adult-filter-off -o \""$2"\"")}'
  • Related