Home > Blockchain >  How to download about 1000 small files (~20Kb) with the same URL mask using bash
How to download about 1000 small files (~20Kb) with the same URL mask using bash

Time:05-31

There is a certain amount (~1000) of small jpeg files (about 20Kb) that are located at the URLs like:

https://example.com/file=1.0
https://example.com/file=1.1
...
https://example.com/file=1.973
https://example.com/file=1.974

How to download them using bash script? I do not know how to write scripts, but I think that there is some simple way using wget for example. They have the same filename.jpeg so need to download them with consecutive names like filename-1.jpg, filename-2.jpg ...

CodePudding user response:

Curl has this built-in feature to be able to download multiple URLs with generated sequences and apply those same sequences to the saved file name:

curl \
  --user-agent 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36' \
  --parallel \
  --url "https://example.com/file=1.[0-974]" \
  --output "filename-#1.jpg"

See: https://curl.se/docs/manpage.html#-o

-o, --output <file>

Write output to <file> instead of stdout. If you are using {} or [] to fetch multiple documents, you should quote the URL and you can use '#' followed by a number in the <file> specifier. That variable will be replaced with the current string for the URL being fetched. Like in:

 curl "http://{one,two}.example.com" -o "file_#1.txt"

or use several variables like:

 curl "http://{site,host}.host[1-5].com" -o "#1_#2"

You may use this option as many times as the number of URLs you have. For example, if you specify two URLs on the same command line, you can use it like this:

  curl -o aa example.com -o bb example.net

and the order of the -o options and the URLs does not matter, just that the first -o is for the first URL and so on, so the above command line can also be written as

  curl example.com example.net -o aa -o bb

See also the --create-dirs option to create the local directories dynamically. Specifying the output as '-' (a single dash) will force the output to be done to stdout.

To suppress response bodies, you can redirect output to /dev/null:

  curl example.com -o /dev/null

Or for Windows use nul:

  curl example.com -o nul

Examples:

curl -o file https://example.com
curl "http://{one,two}.example.com" -o "file_#1.txt"
curl "http://{site,host}.host[1-5].com" -o "#1_#2"
curl -o file https://example.com -o file2 https://example.net

See also -O, --remote-name, --remote-name-all and -J, --remote-header-name.

CodePudding user response:

You can use for with seq like this:

for i in `seq 0 974`; do wget https://example.com/file=1.$i; done;
  • Related