Home > Software design >  Selenium Can't Scrape Vivino Information?
Selenium Can't Scrape Vivino Information?

Time:10-28

I'm trying to obtain tasting notes and food pairing information about wines from Vivino that can't be accessed from their API, but am getting NoSuchElementException when using Selenium in Python. I've been able to scrape price and year information, but not the data further down.

The page I'm trying to scrape from Data I'm trying to obtain

I've tried using WebDriverWait to let the page load:

driver.get('https://www.vivino.com/US-TX/en/villa-maria-auckland-private-bin-sauvignon-blanc/w/39034?year=2021&price_id=26743464')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[@data-testid='mentions']")))

I've tried to use XPath to get the keywords citrus, tropical, tree fruits,...:

tasting_notes = driver.find_elements(By.XPATH, "//div[@data-testid='mentions']")

I've tried getting the text itself using the class name:

#test = driver.find_elements(By.CLASS_NAME,"tasteNote__flavorGroup--1Uaen")

and keep getting NoSuchElementException. Is there an alternative way I can access the information or is Vivino somehow blocking me from scraping this section?

Edit: I've tried including code that scrolls to the bottom before trying to find the data:

    while True:

        # Scroll down to the bottom.
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

        # Wait to load the page.
        time.sleep(2)

        # Calculate new scroll height and compare with last scroll height.
        new_height = driver.execute_script("return document.body.scrollHeight")

        if new_height == last_height:
            break

        last_height = new_height

and still have the problem.

Edit: Solved! Thank you furas for your explanation and Eugeny for the code.

CodePudding user response:

As furas mentioned in the comment, this page has lazy load so you need to scroll the page. But scrolling to the bottom doesn't help here as the page loads only the content you are looking at. So you need to scroll the page slowly to the bottom.
Here is the code how you can do it. Not sure if it's the most elegant solution but it works :)

driver = webdriver.Chrome()
driver.get('https://www.vivino.com/US-TX/en/villa-maria-auckland-private-bin-sauvignon-blanc/w/39034?year=2021&price_id=26743464')
driver.implicitly_wait(10)
page_height = driver.execute_script("return document.body.scrollHeight")
browser_window_height = driver.get_window_size(windowHandle='current')['height']
current_position = driver.execute_script('return window.pageYOffset')
while page_height - current_position > browser_window_height:
    driver.execute_script(f"window.scrollTo({current_position}, {browser_window_height   current_position});")
    current_position = driver.execute_script('return window.pageYOffset')
    sleep(1)  # It is necessary here to give it some time to load the content
print(driver.find_element(By.XPATH, '//div[@data-testid="mentions"]').text)
driver.quit()
  • Related