Home > other >  Python Concurent Future, Each Thread, Each Proxy
Python Concurent Future, Each Thread, Each Proxy

Time:10-27

so i have code like this :

list of sites

a.com
b.com
c.com
d.com
e.com
etc

list of proxy

1.1.1.1
2.2.2.2
etc
def extract(url, proxy):
    print(f'Thread Name : {threading.current_thread().name}')
    print(f'We are using this proxy : {proxy}')
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0'}
    try:
        r = requests.get(url, headers=headers, proxies={'http' : proxy,'https': proxy}, timeout=2)
        soup = BeautifulSoup(r.text, 'html.parser')
        page_title = soup.find('title').text.strip()
        print(page_title)
     except:
        pass

i need to loop the extract function until all the site on the list is finish.. from what i know there are concurent.futures in python, so here what i try :

with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
    executor.map(extract, url_list, proxy_list)

the problem is, let say i just have 2 valid proxy, and i have 5 site, the code will stop in just 2 proxy,

How to solve this problem ? so what i want is each thread have their own proxy, and finish task, until all site in list finished..

thanks

CodePudding user response:

Use itertools.cycle to create an iterator that repeats your proxies indefinitely

with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
    executor.map(extract, url_list, itertools.cycle(proxy_list))
  • Related