Scrapy-Selenium: Chrome Driver does not load page-CodePudding

I have two projects, one with Selenium and one using Scrapy-Selenium, which fits into a Scrapy spider program format but uses Selenium for automation.

I can get the Chromedriver to load the page I want for the basic Selenium program, but something about the second project (with Scrapy) prevents it from loading the URL. Instead it's stuck at showing data:, in the URL bar.

First project (works fine):

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(executable_path="./chromedriver")
driver.get("https://ricemedia.co")

Second project (doesn't load page):

import scrapy
from scrapy_selenium import SeleniumRequest
from selenium import webdriver
import time

class ExampleSpider(scrapy.Spider):
    name = 'rice'

    def start_requests(self):
        yield SeleniumRequest(
            url="https://ricemedia.co",
            wait_time=3,
            callback=self.parse
        )

    def parse(self, response):
        driver = webdriver.Chrome(executable_path="./chromedriver")
        driver.maximize_window()
        time.sleep(20)

I have browsed StackOverflow and Google, and the two most common reasons are outdated Chrome Drivers and missing http in the URL. Neither is the case for me. The path to chromedriver seems alright too (these two projects are in the same folder, along with the same chromedriver). Since one works and the other doesn't, it should have something to do with my Scrapy-Selenium spider.

I should add that I have installed Scrapy, Selenium and Scrapy-Selenium locally in my virtual environment with pip, and I doubt it's an installation issue.

Please help, thanks!

CodePudding user response：

You can use another method to install chrome driver: First of all install Webdriver manager using following pip install webdriver-manager or use maven dep to get it

Then code:

# selenium 3
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

CodePudding user response：

According to scrapy-selenium doc, to run SeleniumRequest scrapy project is a must.
Chrome Driver path does not work that you have done like driver= webdriver.Chrome(executable_path="./chromedriver")
You have to use Chrome Driver path in project settings.py file
You have to put the chromedriver.exe in your project folder
No need driver.maximize_window() as scrapy-selenium well-work with headless mode
No need to use time.sleep(20) thus way as wait_time is used.
scrapy-selenium doc
You have to add scrapy-selenium middleware and execution path in settings.py file as follows:

Just copy and paste the following portion anywhere in settings.py file

#Middleware
    
DOWNLOADER_MIDDLEWARES = {
'scrapy_selenium.SeleniumMiddleware': 800
}
        
        
#Selenium
from shutil import which
SELENIUM_DRIVER_NAME = 'chrome'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('chromedriver')
SELENIUM_DRIVER_ARGUMENTS = ['--headless']

Script looks like:

import scrapy
from scrapy_selenium import SeleniumRequest

class ExampleSpider(scrapy.Spider):
    name = 'rice'

    def start_requests(self):
        yield SeleniumRequest(
            url="https://ricemedia.co",
            wait_time=3,
            callback=self.parse
        )

    def parse(self, response):
        driver = response.meta['driver']
       #start coding...