Home > OS >  Download URLs in JSON file and save with a specific naming pattern
Download URLs in JSON file and save with a specific naming pattern

Time:10-05

I'd like to download images and rename them by their index

The only thing I have is a jsonlines file which looks like this:

{"image": 136, "url": "https://enkistroy.ru/800/600/https/avatars.mds.yandex.net/get-zen_doc/1852570/pub_5dbb15dfba281e00b14b4174_5dbb2300e3062c00b072ecce/scale_1200"}
{"image": 137, "url": "https://forums.cdprojektred.com/index.php?attachments/unbenannt-jpg.6709974/"}
{"image": 138, "url": "https://64.media.tumblr.com/1309f93790c53ccacc333b28d12dac66/tumblr_p9fqienz291x5m66ko1_r1_1280.png"}
{"image": 139, "url": "https://i.pinimg.com/originals/4c/21/0c/4c210ca963cf4d52636615ac08126b05.jpg"}

I've tried using jq tool, grep, wget and curl

egrep -o 'https:[^\"]*jpg' images.json | xargs -n 1 curl -O

This only downloads https links with jpg files without renaming them

Maybe writing python script would be easier?

EDIT:

Tried this not sure how to rename with curl? curl -O "#1.jpg"

jq -r '.url' images.json | parallel curl -O

CodePudding user response:

Sounds like you want something like this:

$ cat tst.sh
#!/usr/bin/env bash

idx=0
while IFS= read -r url; do
    (( idx   ))
    sfx="${url##*.}"
    case "$sfx" in
        png ) ;;
        * ) sfx='jpg' ;;
    esac
    echo curl "$url" -o "${idx}.${sfx}"
done < <(jq -r '.url' "${1:-images.json}")

$ ./tst.sh
curl https://enkistroy.ru/800/600/https/avatars.mds.yandex.net/get-zen_doc/1852570/pub_5dbb15dfba281e00b14b4174_5dbb2300e3062c00b072ecce/scale_1200 -o 1.jpg
curl https://forums.cdprojektred.com/index.php?attachments/unbenannt-jpg.6709974/ -o 2.jpg
curl https://64.media.tumblr.com/1309f93790c53ccacc333b28d12dac66/tumblr_p9fqienz291x5m66ko1_r1_1280.png -o 3.png
curl https://i.pinimg.com/originals/4c/21/0c/4c210ca963cf4d52636615ac08126b05.jpg -o 4.jpg

The above assumes that any URL that doesn't end in .png is JPEG, massage to suit with whatever rules you're aware of for identifying the image types to use as the file suffixes or restrict the curl to only the JPEG files.

Obviously remove the echo when you're done testing and want to actually execute the curl.

To use the image data as the index (see the comments below):

$ cat tst.sh
#!/usr/bin/env bash

while read -r idx url; do
    sfx="${url##*.}"
    case "$sfx" in
        png ) ;;
        * ) sfx='jpg' ;;
    esac
    echo curl "$url" -o "${idx}.${sfx}"
done < <( jq -j '.image, " ", .url, "\n"' "${1:-images.json}" )

$ ./tst.sh
curl https://enkistroy.ru/800/600/https/avatars.mds.yandex.net/get-zen_doc/1852570/pub_5dbb15dfba281e00b14b4174_5dbb2300e3062c00b072ecce/scale_1200 -o 136.jpg
curl https://forums.cdprojektred.com/index.php?attachments/unbenannt-jpg.6709974/ -o 137.jpg
curl https://64.media.tumblr.com/1309f93790c53ccacc333b28d12dac66/tumblr_p9fqienz291x5m66ko1_r1_1280.png -o 138.png
curl https://i.pinimg.com/originals/4c/21/0c/4c210ca963cf4d52636615ac08126b05.jpg -o 139.jpg
  • Related