Home > Net >  Multithreading not achieving performance difference Python
Multithreading not achieving performance difference Python

Time:04-21

Below is a program that makes multiple get requests and writes the response images to my directory. These get requests are meant to be in separate threads, and thus be quicker than w/o threads but I'm not seeing the performance difference.

Printing active_count() shows there are 9 threads created. However, the performance time still takes around 40 seconds whether or not I use threading.

Below is me using threading.

from threading import active_count
import requests
import time
import concurrent.futures

img_urls = [
    'https://images.unsplash.com/photo-1516117172878-fd2c41f4a759',
    'https://images.unsplash.com/photo-1532009324734-20a7a5813719',
    'https://images.unsplash.com/photo-1524429656589-6633a470097c',
    'https://images.unsplash.com/photo-1530224264768-7ff8c1789d79',
    'https://images.unsplash.com/photo-1564135624576-c5c88640f235',
    'https://images.unsplash.com/photo-1541698444083-023c97d3f4b6',
    'https://images.unsplash.com/photo-1522364723953-452d3431c267',
    'https://images.unsplash.com/photo-1513938709626-033611b8cc03',
    'https://images.unsplash.com/photo-1507143550189-fed454f93097',
    'https://images.unsplash.com/photo-1493976040374-85c8e12f0c0e',
    'https://images.unsplash.com/photo-1504198453319-5ce911bafcde',
    'https://images.unsplash.com/photo-1530122037265-a5f1f91d3b99',
    'https://images.unsplash.com/photo-1516972810927-80185027ca84',
    'https://images.unsplash.com/photo-1550439062-609e1531270e',
    'https://images.unsplash.com/photo-1549692520-acc6669e2f0c'
]

t1 = time.perf_counter()


def download_image(img_url):
    img_bytes = requests.get(img_url).content
    img_name = img_url.split('/')[3]
    img_name = f'{img_name}.jpg'
    with open(img_name, 'wb') as img_file:
        img_file.write(img_bytes)
        print(f'{img_name} was downloaded...')


with concurrent.futures.ThreadPoolExecutor() as executor:
    executor.map(download_image, img_urls)
    print(active_count())


t2 = time.perf_counter()

print(f'Finished in {t2-t1} seconds')

Below is without threading

def download_image(img_url):
    img_bytes = requests.get(img_url).content
    img_name = img_url.split('/')[3]
    img_name = f'{img_name}.jpg'
    with open(img_name, 'wb') as img_file:
        img_file.write(img_bytes)
        print(f'{img_name} was downloaded...')


for img_url in img_urls:
    download_image(img_url)

Could someone explain why this is happening? Thanks

CodePudding user response:

This is the result i got with your piece of code, with start and end time next to the download. The overall time is around the same (on my "normal network", not the slow one i talked in my comment)

The reason is that multiple thread doesn't increase I/O or bandwith, the limitation could also be the website itself. This looks like the issue is not from your code.

EDIT (misleading statement) : as mentionned by MisterMiyagi in the comment below (read his comment, he explain why), it should increase I/O, that's the reason i get 10s increase on a slow network (limited connection on my work lab). This doesn't increase the I/O or bandwith in that specific case (with full bandwith on my "normal" connection), and this may be from a lot of source, but in my opinion, not the code itself.

I also tried with max_workers=5, the same overall time appears.

    photo-1516117172878-fd2c41f4a759.jpg was downloaded... 1.0464828 - 1.7136098
    photo-1532009324734-20a7a5813719.jpg was downloaded... 1.7140197 - 5.6327612
    photo-1524429656589-6633a470097c.jpg was downloaded... 5.6339666 - 8.3146478
    photo-1530224264768-7ff8c1789d79.jpg was downloaded... 8.3160157 - 10.474087
    photo-1564135624576-c5c88640f235.jpg was downloaded... 10.4749598 - 11.2431941
    photo-1541698444083-023c97d3f4b6.jpg was downloaded... 11.2436369 - 15.6939695
    photo-1522364723953-452d3431c267.jpg was downloaded... 15.6954112 - 18.3257819
    photo-1513938709626-033611b8cc03.jpg was downloaded... 18.3269668 - 21.0607191
    photo-1507143550189-fed454f93097.jpg was downloaded... 21.0621265 - 22.2371699
    photo-1493976040374-85c8e12f0c0e.jpg was downloaded... 22.2375931 - 26.4375676
    photo-1504198453319-5ce911bafcde.jpg was downloaded... 26.4393404 - 28.3477933
    photo-1530122037265-a5f1f91d3b99.jpg was downloaded... 28.348679 - 30.4626719
    photo-1516972810927-80185027ca84.jpg was downloaded... 30.4636931 - 32.2621345
    photo-1550439062-609e1531270e.jpg was downloaded... 32.2628976 - 34.7331719
    photo-1549692520-acc6669e2f0c.jpg was downloaded... 34.7341393 - 35.5910094
    Finished in 34.545366900000005 seconds
    21
    photo-1516117172878-fd2c41f4a759.jpg was downloaded... 35.5960486 - 46.1692758
    photo-1564135624576-c5c88640f235.jpg was downloaded... 35.6110777 - 47.3780254
    photo-1507143550189-fed454f93097.jpg was downloaded... 35.6265503 - 47.4433963
    photo-1549692520-acc6669e2f0c.jpg was downloaded... 35.6692061 - 49.7097683
    photo-1516972810927-80185027ca84.jpg was downloaded... 35.6420564 - 57.2326763
    photo-1504198453319-5ce911bafcde.jpg was downloaded... 35.6340008 - 61.4597509
    photo-1550439062-609e1531270e.jpg was downloaded... 35.6637577 - 62.0488296
    photo-1530224264768-7ff8c1789d79.jpg was downloaded... 35.6072146 - 63.4139648
    photo-1513938709626-033611b8cc03.jpg was downloaded... 35.6223106 - 63.8149815
    photo-1524429656589-6633a470097c.jpg was downloaded... 35.6032493 - 63.8284464
    photo-1530122037265-a5f1f91d3b99.jpg was downloaded... 35.6352735 - 65.0513042
    photo-1522364723953-452d3431c267.jpg was downloaded... 35.6182243 - 65.5005548
    photo-1532009324734-20a7a5813719.jpg was downloaded... 35.5994888 - 66.2930857
    photo-1541698444083-023c97d3f4b6.jpg was downloaded... 35.6144996 - 67.8115219
    photo-1493976040374-85c8e12f0c0e.jpg was downloaded... 35.6301133 - 68.5357319
    Finished in 32.946069800000004 seconds

EDIT 2 (more testing) : I tried with one of my webserver (Same code, just different image list), and I got an overall decrease of 60-70% of downloading time. Work best with limited workers in that case. The problem come from the website, not your code.

CodePudding user response:

I can see some performance improvement when using multiprocessing package.

import multiprocessing
from multiprocessing import Pool


def download_image(img_url: str) -> None:
    img_bytes = requests.get(img_url).content
    img_name = img_url.split('/')[3]
    img_name = f'{img_name}.jpg'
    with open(img_name, 'wb') as img_file:
        img_file.write(img_bytes)
        print(f'{img_name} was downloaded...')


if __name__ == '__main__':
    t1 = time.perf_counter()

    with Pool(processes=multiprocessing.cpu_count() - 1 or 1) as pool:
        pool.map(download_image, img_urls)

    t2 = time.perf_counter()

    print(f'Finished in {t2 - t1} seconds')
  • Related