Home > Back-end >  Using proxies with playwright in python
Using proxies with playwright in python

Time:10-21

I'm using playwright to extract data from a website and I want to use proxies which I get from this website : https://www.proxy-list.download/HTTPS. It doesn't work, and I'm wondering if this is because the proxies are free ? If this is the reason, can someone know where can i find proxies that will work ?

This is my code :

from playwright.sync_api import sync_playwright
import time


url = "https://www.momox-shop.fr/livres-romans-et-litterature-C055/"
with sync_playwright() as p:
    browser = p.firefox.launch(
        headless=False,
        proxy= {
            'server': '209.166.175.201:3128'
        })
    page = browser.new_page()
    page.goto(url)
    time.sleep(5)

Thank you !

CodePudding user response:

Yes, according to your link, all proxies are "dead"

Before using proxies try checking them here is one possible solution:

import json
import requests
from pythonping import ping
from concurrent.futures import ThreadPoolExecutor


check_proxies_url = "https://httpbin.org/ip"
good_proxy = set()

# proxy_lst = requests.get("https://www.proxy-list.download/api/v1/get", params={"type": "https"})
# proxies = [proxy for proxy in proxy_lst.text.split('\r\n') if proxy]
proxy_lst = requests.get("http://proxylist.fatezero.org/proxy.list")
proxies = (f"{json.loads(data)['host']}:{json.loads(data)['port']}" for data in proxy_lst.text.split('\n') if data)

def get_proxies(proxy):
    proxies = {
        "https": proxy,
        "http": proxy
    }
    try:
        response = requests.get(url=check_proxies_url, proxies=proxies, timeout=2)
        response.raise_for_status()
        if ping(target=proxies["https"].split(':')[0], count=1, timeout=2).rtt_avg_ms < 150:
            good_proxy.add(proxies["https"])
            print(f"Good proxies: {proxies['https']}")
    except Exception:
        print(f"Bad proxy: {proxies['https']}")

with ThreadPoolExecutor() as executor:
    executor.map(get_proxies, proxies)

print(good_proxy)

Get a list of active proxies with ping up to 150ms.

Output:

{'209.166.175.201:8080', '170.39.194.156:3128', '20.111.54.16:80', '20.111.54.16:8123'}

But in any case, this is a shared proxy and their performance is not guaranteed. If you want to be sure that your parser will work, then it is better to buy a proxy.

I ran your code with received proxy '170.39.194.156:3128' and for now it works

  • Related