I'd like to download images and rename them by their index
The only thing I have is a jsonlines file which looks like this:
{"image": 136, "url": "https://enkistroy.ru/800/600/https/avatars.mds.yandex.net/get-zen_doc/1852570/pub_5dbb15dfba281e00b14b4174_5dbb2300e3062c00b072ecce/scale_1200"}
{"image": 137, "url": "https://forums.cdprojektred.com/index.php?attachments/unbenannt-jpg.6709974/"}
{"image": 138, "url": "https://64.media.tumblr.com/1309f93790c53ccacc333b28d12dac66/tumblr_p9fqienz291x5m66ko1_r1_1280.png"}
{"image": 139, "url": "https://i.pinimg.com/originals/4c/21/0c/4c210ca963cf4d52636615ac08126b05.jpg"}
I've tried using jq tool, grep, wget and curl
egrep -o 'https:[^\"]*jpg' images.json | xargs -n 1 curl -O
This only downloads https links with jpg files without renaming them
Maybe writing python script would be easier?
EDIT:
Tried this not sure how to rename with curl? curl -O "#1.jpg"
jq -r '.url' images.json | parallel curl -O
CodePudding user response:
Sounds like you want something like this:
$ cat tst.sh
#!/usr/bin/env bash
idx=0
while IFS= read -r url; do
(( idx ))
sfx="${url##*.}"
case "$sfx" in
png ) ;;
* ) sfx='jpg' ;;
esac
echo curl "$url" -o "${idx}.${sfx}"
done < <(jq -r '.url' "${1:-images.json}")
$ ./tst.sh
curl https://enkistroy.ru/800/600/https/avatars.mds.yandex.net/get-zen_doc/1852570/pub_5dbb15dfba281e00b14b4174_5dbb2300e3062c00b072ecce/scale_1200 -o 1.jpg
curl https://forums.cdprojektred.com/index.php?attachments/unbenannt-jpg.6709974/ -o 2.jpg
curl https://64.media.tumblr.com/1309f93790c53ccacc333b28d12dac66/tumblr_p9fqienz291x5m66ko1_r1_1280.png -o 3.png
curl https://i.pinimg.com/originals/4c/21/0c/4c210ca963cf4d52636615ac08126b05.jpg -o 4.jpg
The above assumes that any URL that doesn't end in .png
is JPEG, massage to suit with whatever rules you're aware of for identifying the image types to use as the file suffixes or restrict the curl to only the JPEG files.
Obviously remove the echo
when you're done testing and want to actually execute the curl
.
To use the image data as the index (see the comments below):
$ cat tst.sh
#!/usr/bin/env bash
while read -r idx url; do
sfx="${url##*.}"
case "$sfx" in
png ) ;;
* ) sfx='jpg' ;;
esac
echo curl "$url" -o "${idx}.${sfx}"
done < <( jq -j '.image, " ", .url, "\n"' "${1:-images.json}" )
$ ./tst.sh
curl https://enkistroy.ru/800/600/https/avatars.mds.yandex.net/get-zen_doc/1852570/pub_5dbb15dfba281e00b14b4174_5dbb2300e3062c00b072ecce/scale_1200 -o 136.jpg
curl https://forums.cdprojektred.com/index.php?attachments/unbenannt-jpg.6709974/ -o 137.jpg
curl https://64.media.tumblr.com/1309f93790c53ccacc333b28d12dac66/tumblr_p9fqienz291x5m66ko1_r1_1280.png -o 138.png
curl https://i.pinimg.com/originals/4c/21/0c/4c210ca963cf4d52636615ac08126b05.jpg -o 139.jpg