I have the following code with questions at the bottom:
1 from threading import currentThread
2 import concurrent.futures
3 import urllib.request
4
5 URLS = ['http://www.foxnews.com/',
6 'http://www.cnn.com/',
7 'http://www.bbc.co.uk/']
8
9
10
11 # Retrieve a single page and report the URL and contents
12 def load_url(url, timeout):
13 with urllib.request.urlopen(url, timeout=timeout) as conn:
14
15 print(currentThread().getName(), url)
16 # how do I pass back the thread_name with the "conn.read" back to executor.submit?
17 return conn.read()
18
19
20 # We can use a with statement to ensure threads are cleaned up promptly
21 with concurrent.futures.ThreadPoolExecutor(max_workers=3, thread_name_prefix='url_thread') as executor:
22 # Start the load operations and mark each future with its URL
23 futures_dict = {executor.submit(load_url, url, 60): url for url in URLS}
24 for future in concurrent.futures.as_completed(futures_dict):
25 url = futures_dict[future]
26 try:
27 data = future.result()
28 except Exception as exc:
29 # print('%r generated an exception: %s' % (url, exc))
30 print(f'Thread Name {currentThread().getName()}: {url} generated an exception {exc}')
31 else:
32 # print('%r page is %d bytes' % (url, len(data)))
33 print(f'Thread Name {currentThread().getName()}: content length of {url} = {len(data)} bytes')
My output is as follows:
url_thread_1 http://www.cnn.com/
url_thread_0 http://www.foxnews.com/
url_thread_2 http://www.bbc.co.uk/
Thread Name MainThread: content length of http://www.foxnews.com/ = 282968 bytes
Thread Name MainThread: content length of http://www.cnn.com/ = 1115357 bytes
Thread Name MainThread: content length of http://www.bbc.co.uk/ = 363642 bytes
Questions:
The above shows the executor.submit target function load_url
returns conn.read()
(line 17).
If I do print(threading.currentThread.getName())
within the with ThreadPoolExecutor
statement (lines 30 and 33), it always shows "MainThread".
But when I do print(threading.currentThread.getName())
within the target function load_url
(see line 15) - it correctly shows url_thread_0
, url_thread_1
, etc.
(please note that I am passing max_workers=3
and thread_name_prefix='url_thread
as parameters to the ThreadPoolExecutor)
How do I also pass along the currentThread().getName()
from line 15 along with the conn.read()
from line 17 as as part of the futures_dict key, so I can use it to display the correct thread name in either line 30 or line 33?
CodePudding user response:
I would refactor the load_url
function to return a tuple(or better yet a namedtuple or object) like this:
from threading import currentThread
import concurrent.futures
import urllib.request
URLS = ['https://www.foxnews.com/',
'https://www.cnn.com/',
'https://www.bbc.co.uk/']
def load_url(url, timeout):
try:
with urllib.request.urlopen(url, timeout=timeout) as conn:
return url, currentThread().getName(), conn.read(), None
except Exception as exc:
return url, currentThread().getName(), None, exc
with concurrent.futures.ThreadPoolExecutor(max_workers=3, thread_name_prefix='url_thread') as executor:
futures = (executor.submit(load_url, url, 60) for url in URLS)
for future in concurrent.futures.as_completed(futures):
url, thread_name, data, exc = future.result()
if exc:
print(f'Thread Name {thread_name}: {url} generated an exception {exc}')
else:
print(f'Thread Name {thread_name}: content length of {url} = {len(data)} bytes')
Also move the exception handling into the function(and better yet specify what exceptions we want to catch and not a blanked Exception).