I am using selector gadget to get the xpath from the "read more" button from the first review on this website
This is the xpath it gave:
//*[contains(concat( " ", @class, " " ), concat( " ", "Z", " " ))]
Here is the first part of the code I am using:
import selenium
import csv #This package lets us save data to a csv file
from selenium import webdriver #The Selenium package we'll need
import time #This package lets us pause execution for a bit
from selenium.webdriver.common.by import By
path_to_file = "/Users/user/Desktop/HotelReviews.csv"
pages_to_scrape = 3
url = "https://www.tripadvisor.com/Hotel_Review-g60982-d209422-Reviews-Hilton_Waikiki_Beach-Honolulu_Oahu_Hawaii.html"
# open the file to save the review
csvFile = open(path_to_file, 'a', encoding="utf-8")
csvWriter = csv.writer(csvFile)
for i in range(0, pages_to_scrape):
driver = webdriver.Chrome()
driver.get("url")
# give the DOM time to load
time.sleep(2)
driver.find_element_by_xpath("//*[contains(concat( " ", @class, " " ),
concat( " ", "Z", " " ))], 'Read more')]").click()
This is the error I get:
File "/var/folders/6c/jpl964752rv_72zjclrp_8ym0000gn/T/ipykernel_24978/2812702568.py", line 8
driver.find_element_by_xpath("//*[contains(concat( " ", @class, " " ), concat( " ", "Z", " " ))], 'Read more')]").click()
^
SyntaxError: invalid syntax
Looks like it's the quotation marks that seems to be the issue.
So I followed this advice. I tried making the code a variable, but it spit out the same error. I tried removing the extra quotes, same error. I tried removing the space between the quotes, same error.
I tried a different xpath, one for the whole review
//*[contains(concat( " ", @class, " " ), concat( " ", "F1", " " ))]
Same error.
Then I tried adjusting the quotation marks on the first xpath
driver.find_element_by_xpath("//*[contains(concat( " ", @class, " " ),
concat( " ", "Z", " " ))]", "Read more")]).click()
results to the same error.
CodePudding user response:
To click() on the Read more link from the first review within tripadvisor website you need to induce WebDriverWait for the element_to_be_clickable() and you can use the following locator strategy:
Using XPATH:
driver.get('https://www.tripadvisor.com/Hotel_Review-g60982-d209422-Reviews-Hilton_Waikiki_Beach-Honolulu_Oahu_Hawaii.html') WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@placeholder='Search reviews']//following::div[@data-test-target='HR_CC_CARD']//span[text()='Read more']"))).click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
Browser Snapshot:
CodePudding user response:
The basic problem is that while, for example a[x="3"]
is a valid XPath expression, you can't put this in a Python string literal as "a[x="3"]"
without escaping the quotes. I'm not a Python user but in most languages you would write "a[x=\"3\"]"
; alternatively in XPath single and double quotes can be used interchangeably so you could write "a[x='3']"