Python Concurent Future, Each Thread, Each Proxy-CodePudding

so i have code like this :

list of sites

a.com
b.com
c.com
d.com
e.com
etc

list of proxy

1.1.1.1
2.2.2.2
etc

def extract(url, proxy):
    print(f'Thread Name : {threading.current_thread().name}')
    print(f'We are using this proxy : {proxy}')
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0'}
    try:
        r = requests.get(url, headers=headers, proxies={'http' : proxy,'https': proxy}, timeout=2)
        soup = BeautifulSoup(r.text, 'html.parser')
        page_title = soup.find('title').text.strip()
        print(page_title)
     except:
        pass

i need to loop the extract function until all the site on the list is finish.. from what i know there are concurent.futures in python, so here what i try :

with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
    executor.map(extract, url_list, proxy_list)

the problem is, let say i just have 2 valid proxy, and i have 5 site, the code will stop in just 2 proxy,

How to solve this problem ? so what i want is each thread have their own proxy, and finish task, until all site in list finished..

thanks

CodePudding user response：

Use itertools.cycle to create an iterator that repeats your proxies indefinitely

with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
    executor.map(extract, url_list, itertools.cycle(proxy_list))