I am trying to write scrapper for "free-proxy.cz" website, however, I am facing a problem
I know my "port" section is wrong, but I don't know the problem and how to fix it.
here is the code:
import requests
from bs4 import BeautifulSoup
import base64
urls = ['http://free-proxy.cz/en/proxylist/country/all/socks5/date/all',
'http://free-proxy.cz/en/proxylist/country/all/socks5/date/all/2',
'http://free-proxy.cz/en/proxylist/country/all/socks5/date/all/3',
'http://free-proxy.cz/en/proxylist/country/all/socks5/date/all/4',
'http://free-proxy.cz/en/proxylist/country/all/socks5/date/all/5',
]
for url in urls:
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
table = soup.find('table', {'id': 'proxy_list'})
for row in table.find('tbody').find_all('tr'):
for ip in row.find('script'):
text=base64.b64decode(ip[29:-2:])
for port in row.find('span', attrs='fport'):
print(port.get_text())
#ipadd=print(prt.decode('utf-8') ':' ports)
** I commented the last line because the port grabber is not working correct.
the result of running the above code is :
Traceback (most recent call last):
File "LOCATION\main.py", line 22, in <module>
for port in row.find('span', attrs='fport'):
TypeError: 'NoneType' object is not iterable
80
45554
1080
1080
what is the issue here ?
CodePudding user response:
span_rows = row.find('span', attrs='fport')
if span_rows is not None:
for port in span_rows:
print(port.get_text())
CodePudding user response:
Try the below code. It should work because, in my case, it's working fine.
from selenium import webdriver
import time
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.service import Service
webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service)
url ='http://free-proxy.cz/en/proxylist/country/all/socks5/date/all/{p}'
for p in range(1,6):
driver.get(url.format(p=p))
driver.maximize_window()
time.sleep(3)
soup = BeautifulSoup(driver.page_source,"html.parser")
for row in soup.select('#proxy_list tbody tr'):
port = row.select_one('td:nth-child(2) span')
port = port.get_text() if port else None
print(port)
Output:
44763
48372
9050
5446
None
9050
10001
16894
3389
32015
9991
4145
8047
33427
8000
8036
1080
7302
12345
3128
58411
1080
4145
4145
None
64817
1080
54154
55090
26493
9050
7497
1080
7302
2210
15864
18438
18240
... so on