I have two projects, one with Selenium and one using Scrapy-Selenium, which fits into a Scrapy spider program format but uses Selenium for automation.
I can get the Chromedriver to load the page I want for the basic Selenium program, but something about the second project (with Scrapy) prevents it from loading the URL. Instead it's stuck at showing data:, in the URL bar.
First project (works fine):
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(executable_path="./chromedriver")
driver.get("https://ricemedia.co")
Second project (doesn't load page):
import scrapy
from scrapy_selenium import SeleniumRequest
from selenium import webdriver
import time
class ExampleSpider(scrapy.Spider):
name = 'rice'
def start_requests(self):
yield SeleniumRequest(
url="https://ricemedia.co",
wait_time=3,
callback=self.parse
)
def parse(self, response):
driver = webdriver.Chrome(executable_path="./chromedriver")
driver.maximize_window()
time.sleep(20)
I have browsed StackOverflow and Google, and the two most common reasons are outdated Chrome Drivers and missing http in the URL. Neither is the case for me. The path to chromedriver seems alright too (these two projects are in the same folder, along with the same chromedriver). Since one works and the other doesn't, it should have something to do with my Scrapy-Selenium spider.
I should add that I have installed Scrapy, Selenium and Scrapy-Selenium locally in my virtual environment with pip, and I doubt it's an installation issue.
Please help, thanks!
CodePudding user response:
You can use another method to install chrome driver:
First of all install Webdriver manager using following pip install webdriver-manager
or use maven dep to get it
Then code:
# selenium 3
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
CodePudding user response:
According to
scrapy-selenium
doc, to runSeleniumRequest
scrapy project is a must.Chrome Driver path does not work that you have done like driver= webdriver.Chrome(executable_path="./chromedriver")
You have to use Chrome Driver path in project
settings.py
fileYou have to put the
chromedriver.exe
in your project folderNo need
driver.maximize_window()
as scrapy-selenium well-work with headless modeNo need to use
time.sleep(20)
thus way as wait_time is used.You have to add scrapy-selenium middleware and execution path in settings.py file as follows:
Just copy and paste the following portion anywhere in settings.py file
#Middleware
DOWNLOADER_MIDDLEWARES = {
'scrapy_selenium.SeleniumMiddleware': 800
}
#Selenium
from shutil import which
SELENIUM_DRIVER_NAME = 'chrome'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('chromedriver')
SELENIUM_DRIVER_ARGUMENTS = ['--headless']
Script looks like:
import scrapy
from scrapy_selenium import SeleniumRequest
class ExampleSpider(scrapy.Spider):
name = 'rice'
def start_requests(self):
yield SeleniumRequest(
url="https://ricemedia.co",
wait_time=3,
callback=self.parse
)
def parse(self, response):
driver = response.meta['driver']
#start coding...