I am web-scraping reviews from Goodreads for a project. Here's an example of a page I've been trying: https://www.goodreads.com/book/show/2767052-the-hunger-games/reviews?
The reviews page initially shows 30 reviews with a 'Show More' button at the bottom. Selenium seems unable to click the button.
Here is the code I'm using:
showmore_button = driver.find_element(By.XPATH, '/html/body/div[1]/div/main/div[1]/div[2]/div[4]/div[4]/div/button/span[1]')
driver.execute_script("arguments[0].click();", showmore_button)
I have also tried
showmore_button.click()
but that leads to an exception stating that the element is not clickable
For more context my driver is set up like this:
def createdriver():
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("start-maximized")
options.add_argument('--window-size=1920,1080')
options.add_argument("--incognito")
driver = webdriver.Chrome(options=options)
return driver
and then I use:
driver = createdriver()
driver.get(url)
Where the URL is the reviews page I'm trying to scrape
CodePudding user response:
As the page loads for few milli-seconds before the element is displayed, you need to apply selenium waits. Try using implicit wait after creating the driver instance, see code below:
driver = createdriver()
driver.implicitly_wait(10)
driver.get(url)
Above code waits for 10 seconds searching for the element before throwing error
Also another suggestion: Instead of using an absolute XPath, as a best practice use relative XPath. This is because relative XPath is more consistent compared to absolute XPath. Absolute XPath may stop working, If the DOM structure changes in the future. Try the below relative XPath:
showmore_button = driver.find_element(By.XPATH, '//span[contains(text(),"Show more reviews")]')
driver.execute_script("arguments[0].click();", showmore_button)
CodePudding user response:
To click on the element Show more reviews at the bottom of the page you need to scrollIntoView()
inducing WebDriverWait for the visibility_of_element_located() and you can use the following locator strategies:
Code block:
driver.get('https://www.goodreads.com/book/show/2767052-the-hunger-games/reviews?') time.sleep(5) driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='ReviewsList__listContext ReviewsList__listContext--centered']//span[contains(., 'Displaying 1 -')]")))) driver.execute_script("arguments[0].click();", driver.find_element(By.XPATH, "//span[text()='Show more reviews']"))
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
Browser snapshot: