Right now using scrapy.Selector to extract data from driver.page_source (Selenium). Looking for another way of doing this without loading scrapy library. Don't want to use driver.find_elements method
import selenium, scrapy
from scrapy import Selector
driver.get(link)
page_source = driver.page_source
selector = Selector(text=page_source)
links = selector.xpath('//a[contains(@class, "jcs-JobTitle")]/@href').extract()
next_page = selector.xpath('//a[@aria-label="Next Page"]/@href').extract_first()
CodePudding user response:
Use parsel
parsel
is the scrapy selector library, just without the rest of scrapy.
The only part of your code that would need changing is the imports. you also might have to use get
and getall
instead of extract_first
and extract
.
import selenium
from parsel import Selector
driver.get(link)
page_source = driver.page_source
selector = Selector(text=page_source)
links = selector.xpath('//a[contains(@class, "jcs-JobTitle")]/@href').getall()
next_page = selector.xpath('//a[@aria-label="Next Page"]/@href').get()