I'm using Scrapy
in Google Colab
but I always get ReactorNotRestartable
error:
First I installed Scrapy using pip and then I used this code:
import scrapy
from scrapy.crawler import CrawlerProcess
class TestSpider(scrapy.Spider):
name="test"
def start_requests(self):
yield scrapy.Request("A valid URL")
def parse(self, response):
products=response.css("div.product-card")
for item in products:
yield {
"price":products.css("div.price-range::text").get(),
}
process=CrawlerProcess(settings={
"FEED_URI" : "test.csv",
"FEED_FORMAT" : "csv"
})
process.crawl(TestSpider)
process.start()
I was following a tutorial about "How to use Scrapy in python script" but my code is not working.
Why am I getting "ReactorNotRestartable" error using Scrapy?
CodePudding user response:
You are facing this problem because you are using colab/jupyter notebook because you cannot restart the twisted reactor. you can solve this by either:
- restarting your notebook. You have to restart each time you run the spider.
- Or run your spider locally using python (not jupyter).
- Or you can use crochet. Check this answer for how to set it up.