I'm trying to wrap my head around threading in python and have learned that its great for I/O heavy tasks. That being said, when I created a simple script to pull stock prices from an API, i saw that my multithreaded code ran slower than my sequential code.
Can someone explain why this is the case?
import requests
import os
from threading import Thread
import time
api_key = os.getenv('ALPHAVANTAGE_API_KEY')
url = 'https://www.alphavantage.co/query?function=OVERVIEW&symbol={}&apikey={}'
symbols = ['AAPL', 'GOOG', 'TSLA', 'MSFT', 'BABA','AAPL', 'GOOG', 'TSLA', 'MSFT', 'BABA','AAPL', 'GOOG', 'TSLA', 'MSFT', 'BABA','AAPL', 'GOOG', 'TSLA', 'MSFT', 'BABA','AAPL', 'GOOG', 'TSLA', 'MSFT', 'BABA','AAPL', 'GOOG', 'TSLA', 'MSFT', 'BABA']
results = []
def get_price(symbol):
print(f'getting {symbol} price')
response = requests.get(url.format(symbol, api_key))
results.append(response.json())
print("Timer started...")
threads = [Thread(target=get_price,args=(symbol,))for symbol in symbols]
if __name__=='__main__':
# run_tasks()
start = time.time()
for thread in threads:
thread.start()
for thread in threads:
thread.join()
# for symbol in symbols:
# get_price(symbol)
end = time.time()
total_time = end - start
print("It took {} seconds to make {} API calls".format(total_time, len(symbols)))
The output of the multi threaded code yielded this:
It took 19.715637922286987 seconds to make 30 API calls
Sequential:
It took 15.80090594291687 seconds to make 30 API calls
CodePudding user response:
You better recheck the durations. I executed this same code and these are the outputs I got.
Program with threads -> It took 1.6476280689239502 seconds to make 30 API calls
Sequential program -> It took 17.5554039478302 seconds to make 30 API calls
CodePudding user response:
Thank you to everyone for responding. I figured out the issue with my code. The requests library is not thread-safe (more on that here: https://github.com/psf/requests/issues/2766)
I believe my program was slower because threads were trying to access the same request session. To fix this I made this change:
thread_local = threading.local()
def get_price(symbol):
session = create_session()
print(f'getting {symbol} price')
response = session.get(url.format(symbol, api_key))
print(f'price for {symbol} retrieved')
def create_session():
if not hasattr(thread_local, "session"):
thread_local.session = requests.Session()
return thread_local.session
I created a threading.local() object
and created a new local session for each thread hitting the get_price() function.