I'm trying to obtain tasting notes and food pairing information about wines from Vivino
that can't be accessed from their API, but am getting NoSuchElementException
when using Selenium
in Python. I've been able to scrape price and year information, but not the data further down.
The page I'm trying to scrape from
I've tried using WebDriverWait
to let the page load:
driver.get('https://www.vivino.com/US-TX/en/villa-maria-auckland-private-bin-sauvignon-blanc/w/39034?year=2021&price_id=26743464')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[@data-testid='mentions']")))
I've tried to use XPath to get the keywords citrus, tropical, tree fruits,...:
tasting_notes = driver.find_elements(By.XPATH, "//div[@data-testid='mentions']")
I've tried getting the text itself using the class name:
#test = driver.find_elements(By.CLASS_NAME,"tasteNote__flavorGroup--1Uaen")
and keep getting NoSuchElementException
. Is there an alternative way I can access the information or is Vivino somehow blocking me from scraping this section?
Edit: I've tried including code that scrolls to the bottom before trying to find the data:
while True:
# Scroll down to the bottom.
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load the page.
time.sleep(2)
# Calculate new scroll height and compare with last scroll height.
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
and still have the problem.
Edit: Solved! Thank you furas for your explanation and Eugeny for the code.
CodePudding user response:
As furas mentioned in the comment, this page has lazy load
so you need to scroll the page. But scrolling to the bottom doesn't help here as the page loads only the content you are looking at. So you need to scroll the page slowly to the bottom.
Here is the code how you can do it. Not sure if it's the most elegant solution but it works :)
driver = webdriver.Chrome()
driver.get('https://www.vivino.com/US-TX/en/villa-maria-auckland-private-bin-sauvignon-blanc/w/39034?year=2021&price_id=26743464')
driver.implicitly_wait(10)
page_height = driver.execute_script("return document.body.scrollHeight")
browser_window_height = driver.get_window_size(windowHandle='current')['height']
current_position = driver.execute_script('return window.pageYOffset')
while page_height - current_position > browser_window_height:
driver.execute_script(f"window.scrollTo({current_position}, {browser_window_height current_position});")
current_position = driver.execute_script('return window.pageYOffset')
sleep(1) # It is necessary here to give it some time to load the content
print(driver.find_element(By.XPATH, '//div[@data-testid="mentions"]').text)
driver.quit()