Python Multi-threaded requests are slower than Sequential?-CodePudding

I'm trying to wrap my head around threading in python and have learned that its great for I/O heavy tasks. That being said, when I created a simple script to pull stock prices from an API, i saw that my multithreaded code ran slower than my sequential code.

Can someone explain why this is the case?

import requests
import os
from threading import Thread
import time


api_key = os.getenv('ALPHAVANTAGE_API_KEY')
url = 'https://www.alphavantage.co/query?function=OVERVIEW&symbol={}&apikey={}'
symbols = ['AAPL', 'GOOG', 'TSLA', 'MSFT', 'BABA','AAPL', 'GOOG', 'TSLA', 'MSFT', 'BABA','AAPL', 'GOOG', 'TSLA', 'MSFT', 'BABA','AAPL', 'GOOG', 'TSLA', 'MSFT', 'BABA','AAPL', 'GOOG', 'TSLA', 'MSFT', 'BABA','AAPL', 'GOOG', 'TSLA', 'MSFT', 'BABA']
results = []


def get_price(symbol):
    print(f'getting {symbol} price')
    response = requests.get(url.format(symbol, api_key))
    results.append(response.json())


print("Timer started...")
threads = [Thread(target=get_price,args=(symbol,))for symbol in symbols]


if __name__=='__main__':

# run_tasks()
    start = time.time()
    for thread in threads:
        thread.start()

    for thread in threads:
        thread.join()

    # for symbol in symbols:
    #     get_price(symbol)

    end = time.time()
    total_time = end - start
    print("It took {} seconds to make {} API calls".format(total_time, len(symbols)))

The output of the multi threaded code yielded this: It took 19.715637922286987 seconds to make 30 API calls

Sequential: It took 15.80090594291687 seconds to make 30 API calls

CodePudding user response：

You better recheck the durations. I executed this same code and these are the outputs I got.

Program with threads -> It took 1.6476280689239502 seconds to make 30 API calls

Sequential program -> It took 17.5554039478302 seconds to make 30 API calls

CodePudding user response：

Thank you to everyone for responding. I figured out the issue with my code. The requests library is not thread-safe (more on that here: https://github.com/psf/requests/issues/2766)

I believe my program was slower because threads were trying to access the same request session. To fix this I made this change:


thread_local = threading.local()

def get_price(symbol):
        session = create_session()
        print(f'getting {symbol} price')
        response = session.get(url.format(symbol, api_key))
        print(f'price for {symbol} retrieved')

def create_session():
    if not hasattr(thread_local, "session"):
        thread_local.session = requests.Session()
    return thread_local.session

I created a threading.local() object and created a new local session for each thread hitting the get_price() function.