Home > Mobile >  'NoneType' object is not iterable in scrapper
'NoneType' object is not iterable in scrapper

Time:09-19

I am trying to write scrapper for "free-proxy.cz" website, however, I am facing a problem

I know my "port" section is wrong, but I don't know the problem and how to fix it.

here is the code:

import requests
from bs4 import BeautifulSoup
import base64


urls = ['http://free-proxy.cz/en/proxylist/country/all/socks5/date/all',
       'http://free-proxy.cz/en/proxylist/country/all/socks5/date/all/2',
       'http://free-proxy.cz/en/proxylist/country/all/socks5/date/all/3',
       'http://free-proxy.cz/en/proxylist/country/all/socks5/date/all/4',
       'http://free-proxy.cz/en/proxylist/country/all/socks5/date/all/5',
]

for url in urls:
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    table = soup.find('table', {'id': 'proxy_list'})
    for row in table.find('tbody').find_all('tr'):
        for ip in row.find('script'):
            text=base64.b64decode(ip[29:-2:])
        for port in row.find('span', attrs='fport'):
            print(port.get_text())
    #ipadd=print(prt.decode('utf-8') ':' ports)

** I commented the last line because the port grabber is not working correct.

the result of running the above code is :

Traceback (most recent call last):
  File "LOCATION\main.py", line 22, in <module>
    for port in row.find('span', attrs='fport'):
TypeError: 'NoneType' object is not iterable
80
45554
1080
1080

what is the issue here ?

CodePudding user response:

    span_rows = row.find('span', attrs='fport')
    if span_rows is not None:
        for port in span_rows:
            print(port.get_text())

CodePudding user response:

Try the below code. It should work because, in my case, it's working fine.

from selenium import webdriver
import time
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.service import Service

webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service)
url ='http://free-proxy.cz/en/proxylist/country/all/socks5/date/all/{p}'

for p in range(1,6):
    driver.get(url.format(p=p))
    driver.maximize_window()
    time.sleep(3)
    soup = BeautifulSoup(driver.page_source,"html.parser")
  
    for row in soup.select('#proxy_list tbody tr'):
        port = row.select_one('td:nth-child(2) span')
        port = port.get_text() if port else None
        print(port)

Output:

44763
48372
9050 
5446 
None 
9050 
10001
16894
3389 
32015
9991 
4145 
8047 
33427
8000 
8036 
1080 
7302
12345
3128
58411
1080
4145
4145
None
64817
1080
54154
55090
26493
9050
7497
1080
7302
2210
15864
18438
18240

... so on

  • Related