How to get the currentThread().getName() as part of (concurrent.futures) future result?-CodePudding

I have the following code with questions at the bottom:

1  from threading import currentThread
2  import concurrent.futures
3  import urllib.request
4 
5  URLS = ['http://www.foxnews.com/',
6          'http://www.cnn.com/',
7          'http://www.bbc.co.uk/']
8 
9 
10
11 # Retrieve a single page and report the URL and contents
12 def load_url(url, timeout):
13    with urllib.request.urlopen(url, timeout=timeout) as conn:
14
15        print(currentThread().getName(), url)
16        # how do I pass back the thread_name with the "conn.read" back to executor.submit? 
17        return conn.read()
18
19
20 # We can use a with statement to ensure threads are cleaned up promptly
21 with concurrent.futures.ThreadPoolExecutor(max_workers=3, thread_name_prefix='url_thread') as executor:
22    # Start the load operations and mark each future with its URL
23    futures_dict = {executor.submit(load_url, url, 60): url for url in URLS}
24    for future in concurrent.futures.as_completed(futures_dict):
25        url = futures_dict[future]
26        try:
27            data = future.result()
28        except Exception as exc:
29            # print('%r generated an exception: %s' % (url, exc))
30            print(f'Thread Name {currentThread().getName()}: {url} generated an exception {exc}')
31        else:
32            # print('%r page is %d bytes' % (url, len(data)))
33            print(f'Thread Name {currentThread().getName()}: content length of {url} = {len(data)} bytes')

My output is as follows:

url_thread_1 http://www.cnn.com/
url_thread_0 http://www.foxnews.com/
url_thread_2 http://www.bbc.co.uk/
Thread Name MainThread: content length of http://www.foxnews.com/ = 282968 bytes
Thread Name MainThread: content length of http://www.cnn.com/ = 1115357 bytes
Thread Name MainThread: content length of http://www.bbc.co.uk/ = 363642 bytes

Questions:

The above shows the executor.submit target function load_url returns conn.read() (line 17).

If I do print(threading.currentThread.getName()) within the with ThreadPoolExecutor statement (lines 30 and 33), it always shows "MainThread".

But when I do print(threading.currentThread.getName()) within the target function load_url (see line 15) - it correctly shows url_thread_0, url_thread_1, etc. (please note that I am passing max_workers=3 and thread_name_prefix='url_thread as parameters to the ThreadPoolExecutor)

How do I also pass along the currentThread().getName() from line 15 along with the conn.read() from line 17 as as part of the futures_dict key, so I can use it to display the correct thread name in either line 30 or line 33?

CodePudding user response：

I would refactor the load_url function to return a tuple(or better yet a namedtuple or object) like this:

from threading import currentThread
import concurrent.futures
import urllib.request

URLS = ['https://www.foxnews.com/',
        'https://www.cnn.com/',
        'https://www.bbc.co.uk/']
 
 
def load_url(url, timeout):
    try:
        with urllib.request.urlopen(url, timeout=timeout) as conn:
            return url, currentThread().getName(), conn.read(), None
    except Exception as exc:
        return url, currentThread().getName(), None, exc


with concurrent.futures.ThreadPoolExecutor(max_workers=3, thread_name_prefix='url_thread') as executor:
    futures = (executor.submit(load_url, url, 60) for url in URLS)
    for future in concurrent.futures.as_completed(futures):
        url, thread_name, data, exc = future.result()

        if exc:
            print(f'Thread Name {thread_name}: {url} generated an exception {exc}')
        else:
            print(f'Thread Name {thread_name}: content length of {url} = {len(data)} bytes')

Also move the exception handling into the function(and better yet specify what exceptions we want to catch and not a blanked Exception).