I want to get address but they provide me empty what I am doing wrong in the XPath..... these is the page link
Code trials:
import scrapy
from scrapy import Selector
from scrapy_selenium import SeleniumRequest
from scrapy.http import Request
class TestSpider(scrapy.Spider):
name = 'test'
def start_requests(self):
yield SeleniumRequest(
url ="https://www.findtruckservice.com/search/?city=Florida, CO&mainCat=1&subCat=Truck Repair&lat=37.0731&lon=-106.247&cat_field=Mobile Repair - Truck Repair",
wait_time = 3,
screenshot = True,
callback = self.parse,
dont_filter = True
)
def parse(self, response):
books = response.xpath("//h3//a//@href").extract()
for book in books:
url = response.urljoin(book)
yield Request(url, callback=self.parse_book)
def parse_book(self, response):
address=response.xpath("//div[1][@class='threecol align_left card']//div//text()").get()
yield{
'address':address
}
CodePudding user response:
To print the desired text from the website you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using XPATH and text attribute:
driver.get("https://www.findtruckservice.com/page/cummins-sales-and-service-farmington-nm-430653") print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//h4[@class='sec-title' and text()='CONTACT']//following::div[@class='container']"))).text)
Using XPATH and
get_attribute("textContent")
:driver.get("https://www.findtruckservice.com/page/cummins-sales-and-service-farmington-nm-430653") print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//h4[@class='sec-title' and text()='CONTACT']//following::div[@class='container']"))).get_attribute("textContent"))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
Console Output:
Cummins Sales and Service 1101 N Troy King Rd Farmington, NM 505-327-7331 (primary) 505-326-2948 (fax)
References
Link to useful documentation:
get_attribute()
methodGets the given attribute or property of the element.
text
attribute returnsThe text of the element.
- Difference between text and innerHTML using Selenium
CodePudding user response:
Try the following:
[...]
address = ' '.join([x.strip() for x in response.xpath("//div[@class='threecol align_left card'][1]/div[@class='container']/text()").extract()])