Home > Blockchain >  KeyError: 'driver' scrapy and selenium together when I run my spider in main file
KeyError: 'driver' scrapy and selenium together when I run my spider in main file

Time:09-18

I have a problem with my Python script. When I just run my spider scrapy runspider Myspider it's work but if I run it from the main file I have this error : KeyError: 'driver'

Settings file :

SELENIUM_DRIVER_NAME = 'chrome'
#SELENIUM_DRIVER_EXECUTABLE_PATH = '/home/PATH/OF/FILE/chromedriver'
SELENIUM_DRIVER_ARGUMENTS=['--headless']

DOWNLOADER_MIDDLEWARES = {
    'scrapy_selenium.SeleniumMiddleware': 800
}

My spider file :

class MySpider(scrapy.Spider):
    name = 'my_spider'
    
    
    def __init__(self, list_urls, *args, **kwargs):
        super(my_spider, self).__init__(*args, **kwargs)
        self.urls = list_urls

    def start_requests(self):
        for url in self.urls:
            yield SeleniumRequest(
                url = url['link'],
                callback = self.parse,
                wait_time = 15,
            )

and my main file :

import scrapy
import classListUrls
from scrapy.crawler import CrawlerProcess
from dir.spiders import Spider


URL = "example.com"
urls = classListUrls.GenListUrls(URL)

process = CrawlerProcess()
process.crawl(Spider.my_spider, list_urls = urls.list_urls())
process.start()

I don't understand why this error.

CodePudding user response:

One problem I see is, the first parameter to process.crawl should be the spider class, instead of the spider name.

process.crawl(Spider.MySpider, list_urls=urls.list_urls())

And the same is true when you call the superclass in the spiders __init__, although the better option would be to just leave it empty since the class is already the default.

class MySpider(scrapy.Spider):
    def __init__(self, *args, list_urls=None,**kwargs):
        super().__init__(*args, **kwargs)

Another thing is that the crawlerProcess needs to be constructed with a settings object because it doesn't read from the main settings.py file.

process = CrawlerProcess(settings={"SELENIUM_DRIVER_NAME": 'chrome',
                                   "SELENIUM_DRIVER_ARGUMENTS": ['--headless'],
                                   "DOWNLOADER_MIDDLEWARES": {'scrapy_selenium.SeleniumMiddleware': 800}})
  • Related