Threadpool for image downloading-CodePudding

How can i download 1000 images quicker using thread pool? as it’s taking far too long to download these 1000 images witt my current script

current script

import request

image_url = [
      “http://image_eg_001”,
      “http://image_eg_002”,
      “http://image_eg_003”,

]

for img in image_url:
   file_name = img.split(‘/‘)[-1]
   print(“Downloading File:%s”%file_name)
   r = request.get(img, stream=True)
   with open(file_name, ‘wb’) as f:
     for chunk in r:
     f.write(chunk)

CodePudding user response：

You can use the AsyncIO and the AIOHTTP package to perform concurrent network requests. A potential solution would be:

import asyncio
import aiohttp

async def download_image(image_url: str, save_path: str, session: aiohttp.ClientSession):
  async with session.get(image_url) as response:
    content = await response.read()
    with open(save_path, "wb") as f:
      f.write(content)

async def main():
  image_urls = [...]
  save_paths = [...]

  async with aiohttp.ClientSession() as session:
    await asyncio.gather(*[download_image(im, p, session) for im, p in zip(image_urls, save_paths)])    

if __name__ == "__main__":
  asyncio.run(main())

The download_image() function is responsible for the downloading and saving of one image.

the main() function performs concurrent requests using asyncio.gather().

CodePudding user response：

You can use concurrent.futures.ThreadPoolExecutor class. I chose 100 for number of worker thread but you can change it for your system, it can be more or less according to your case. More worker thread can effect badly of your responsiveness and system resource if downloading takes long time.

Here the threadpool solution for downloading the image

import requests
from concurrent.futures import ThreadPoolExecutor

image_url = [
      'http://image_eg_001',
      'http://image_eg_002',
      'http://image_eg_003',

]

def download(url):
    r = requests.get(url, allow_redirects=False)

    with open(url.split("/")[-1], "wb") as binary:
        binary.write(r.content)


with ThreadPoolExecutor(max_workers=100) as executor:
    executor.map(download,image_url)