I am to scrape dynamic website but the selenium will provide me these error 'chromedriver' executable needs to be in PATH
Can you solve these problem
from scrapy import Spider
from scrapy.http import Request
from scrapy.utils.project import get_project_settings
from selenium import webdriver
class AuthorSpider(Spider):
name = 'pushpa'
def start_requests(self):
self.driver = webdriver.Chrome(executable_path='C:/Program Files (x86)/chromedriver')
driver = webdriver.Chrome(driver_path, options=options)
driver.get('https://www.lazada.com.ph/shop-laptops/')
link_elements = driver.find_elements_by_xpath(
'//*[@data-qa-locator="product-item"]//a[text()]')
for link in link_elements:
yield{
'url':link
}
CodePudding user response:
executable_path
should be set to absolute path to chromedriver.exe
file containing the chromedriver.exe
file itself.
So, in case your chromedriver.exe
is inside the 'C:/Program Files (x86)/chromedriver'
folder it should be
self.driver = webdriver.Chrome(executable_path='C:/Program Files (x86)/chromedriver/chromedriver.exe')
Also I don't understand why are you defining and initializing 2 objects of the driver? :
self.driver = webdriver.Chrome(executable_path='C:/Program Files (x86)/chromedriver')
driver = webdriver.Chrome(driver_path, options=options)
CodePudding user response:
The perfect solution is SeleniumRequest
. To use SeleniumRequest with scrapy, scrapy project is a must.
Script:
import scrapy
from scrapy_selenium import SeleniumRequest
class AuthorSpider(scrapy.Spider):
name = 'pushpa'
def start_requests(self):
url='https://www.lazada.com.ph/shop-laptops/'
yield SeleniumRequest(
url=url,
wait_time=5,
callback=self.parse
)
def parse(self, response):
link_elements = response.xpath ('//*[@data-qa-locator="product-item"]//a[text()]/@href').getall()
for link in link_elements:
link=f'https:{link}'
yield {
'url':link }
Output:
{'url': 'https://www.lazada.com.ph/products/coreldraw-graphics-suite-x6-dvd-pc-installer-i1733548522-s7464446610.html?search=1'}
2022-03-04 02:47:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lazada.com.ph/shop-laptops/>
{'url': 'https://www.lazada.com.ph/products/laptop-hp-probook-4545s-amd-a4-4300m-4gb-ram-ddr3-250gb-hdd-radeon-hd-graphics-i1208954033-s13141803102.html?search=1'}
2022-03-04 02:47:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lazada.com.ph/shop-laptops/>
{'url': 'https://www.lazada.com.ph/products/gift-monitor-17inlaptop-for-sale-brand-new-9470m9480m-i-laptop-i5-i-light-and-portable-i-14in-i-fourth-generation-processor-i-core-intel-i5-i-16gb-ram-i-480gb-ssd-i-built-in-camera-hdmi-hd-interface-i-suitable-for-online-courses-learni-i2732325355-s13083117290.html?search=1&freeshipping=1'}
2022-03-04 02:47:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lazada.com.ph/shop-laptops/>
{'url': 'https://www.lazada.com.ph/products/acer-predator-helios-300-70bf-ph315-54-70bf-gaming-laptop-144hz-ips-panel-intel-core-i7-11800h-8-cores-rtx-3050ti-16gb-ram-512gb-ssd-pc-central-i2590081219-s12159342297.html?search=1'}
2022-03-04 02:47:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lazada.com.ph/shop-laptops/>
{'url': 'https://www.lazada.com.ph/products/free-air-fryerlaptop-i-l460-i-14in-i-6th-generation-processor-i-core-i5-i-4gb8gb16gb-memory-i-256gb-ssd480gb-ssd-i-compatible-with-windows10-suitable-for-learning-work-online-i2388508967-s10876939835.html?search=1&freeshipping=1'}
... so on
settings.py file:
You have to add the following portion in settings.py file
# Middleware
DOWNLOADER_MIDDLEWARES = {
'scrapy_selenium.SeleniumMiddleware': 800
}
# Selenium
from shutil import which
SELENIUM_DRIVER_NAME = 'chrome'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('chromedriver')
# '--headless' if using chrome instead of firefox
SELENIUM_DRIVER_ARGUMENTS = ['--headless']