I am trying to run this practice scrapy code but it's continuously giving this error. It is giving me error of AttributeError: Selector object is not iterable error
Here is code:
from scrapy import Spider
class WikiSpider(Spider):
name = 'wiki'
allowed_domains = ['wikipedia.com']
start_urls = ['https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States']
def parse(self, response):
Tabel=response.xpath('//table[contains(@class,"wikitable sortable")]')[0]
for tabel in Tabel:
state=tabel.xpath('.//tbody/tr/th/a/text()')[1:].extract()
yield{
state
}
Here is the error msg:
2021-10-07 04:23:39 [scrapy.core.engine] INFO: Spider opened
2021-10-07 04:23:39 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-10-07 04:23:39 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2021-10-07 04:23:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://en.wikipedia.org/robots.txt> (referer: None)
2021-10-07 04:23:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States> (referer: None)
2021-10-07 04:23:41 [scrapy.core.scraper] ERROR: Spider error processing <GET https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States> (referer: None)
Traceback (most recent call last):
File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\utils\defer.py", line 120, in iter_errback
yield next(it)
File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\utils\python.py", line 353, in __next__
return next(self.data)
File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\utils\python.py", line 353, in __next__
return next(self.data)
File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 56, in _evaluate_iterable
for r in iterable:
File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
for x in result:
File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 56, in _evaluate_iterable
for r in iterable:
File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 342, in <genexpr>
return (_set_referer(r) for r in result or ())
File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 56, in _evaluate_iterable
for r in iterable:
File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 40, in <genexpr>
return (r for r in result or () if _filter(r))
File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 56, in _evaluate_iterable
for r in iterable:
File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "C:\Users\Abu Bakar Siddique\AppData\Local\Programs\Python\Python39\lib\site-packages\scrapy\core\spidermw.py", line 56, in _evaluate_iterable
for r in iterable:
File "D:\tutorials\WEB scrapping\web scraping practice projects\wikipedia\wikipedia\spiders\wiki.py", line 17, in parse
for tabel in Tabel:
TypeError: 'Selector' object is not iterable
Thanks in advance for awesome support
CodePudding user response:
when you doing Tabel=response.xpath('//table[contains(@class,"wikitable sortable")]')
it give you list of Selector but you selected first element with [0]
at end of line
that gives you a Selector because of that you get that exception
changeTabel=response.xpath('//table[contains(@class,"wikitable sortable")]')[0]
toTabel=response.xpath('//table[contains(@class,"wikitable sortable")]')
CodePudding user response:
Because you are missing key in yield:
from scrapy import Spider
class WikiSpider(Spider):
name = 'wiki'
allowed_domains = ['wikipedia.com']
start_urls = [
'https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States']
def parse(self, response):
Tabel = response.xpath(
'//table[contains(@class,"wikitable sortable")]')
for tabel in Tabel:
state = tabel.xpath('.//tbody/tr/th/a/text()')[1:].extract()
yield {
'state': state
}