Home > Software engineering >  presence_of_all_elements_located does not return the full list, web scraping with Selenium
presence_of_all_elements_located does not return the full list, web scraping with Selenium

Time:07-24

I am trying to get food names from this menu, but for some reason I am not able to get the full list of items, I am getting only 6 items. Using google developer, I can clearly see that the number of elements containing the class name that I indicated is definitely higher than 6

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

path="/Users/ruhanamirza/Downloads/chromedriver"
driver=webdriver.Chrome(path)
driver.get('https://wolt.com/az/aze/baku/venue/gurmania-winter-park')
try:
    modules=WebDriverWait(driver,120).until(
        EC.presence_of_all_elements_located((By.CLASS_NAME,"MenuItem-module_content__mNrbB"))
    )
    for module in modules:
        name=module.find_element(By.CLASS_NAME,"MenuItem-module_name__iqvnU")
        print(name.text)
finally:
    driver.quit()

CodePudding user response:

To print the text of the items instead of presence_of_all_elements_located() you have to induce WebDriverWait for visibility_of_all_elements_located() and using list comprehension you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    driver.execute("get", {'url': 'https://wolt.com/az/aze/baku/venue/gurmania-winter-park'})
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[contains(@data-localization-key, 'accept')]//div[starts-with(@class, 'Button__Content')]"))).click()
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "p[data-test-id='menu-item.name']")))])
    
  • Using XPATH:

    driver.execute("get", {'url': 'https://wolt.com/az/aze/baku/venue/gurmania-winter-park'})
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[contains(@data-localization-key, 'accept')]//div[starts-with(@class, 'Button__Content')]"))).click()
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//p[@data-test-id='menu-item.name']")))])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Console Output:

    ['Hillside Blush Rose, 750 ml', 'Hillside Classico, 750 ml', 'Hillside Saperavi, 750 ml', 'Hillside Reserve, 750 ml', 'Hillside Pinot Grigio, 750 ml', 'Hillside Image, 750 ml', 'Hillside Prestige, 750 ml', 'Hillside Caucasus, 750 ml', 'Hillside Rose, 750 ml']
    

CodePudding user response:

Doing it with selenium, we can observe that, on page scroll, elements are being created dynamically, with images being pulled from APIs like https://imageproxy.wolt.com/menu/menu-images/5e75dca9494db98d926e52e3/a96d46e6-feb0-11ec-9cdb-d605cab88f2d_img_20220708_132819.jpeg?w=200. One way of obtaining this info is with requests, like below. Another way is to scroll the page .. maybe 75% of the active view, collect the items in view and add them to some list, and do this until we reach page bottom. See below for another solution using requests & BeautifulSoup.

These are wine names, not food names tho... is this what you're after?

import requests
from bs4 import BeautifulSoup

r = requests.get('https://wolt.com/az/aze/baku/venue/gurmania-winter-park')
soup = BeautifulSoup(r.text, 'html.parser')
titles = soup.find_all('p', {'data-test-id': 'menu-item.name'})
for t in titles:
    title = t.text
    print(title)

Result:

Hillside Blush Rose, 750 ml
Hillside Classico, 750 ml
Hillside Saperavi, 750 ml
Hillside Reserve, 750 ml
Hillside Pinot Grigio, 750 ml
Hillside Image, 750 ml
Hillside Prestige, 750 ml
Hillside Caucasus, 750 ml
Hillside Rose, 750 ml
Gurmania Rkatsiteli
Madrasa by Gurmania 750 ml
Yarimada Shiraz Rose 2015 , 750 ml
Yarimada Chardonnay 2014, 750 ml
Merlot by Gurmania 750 ml
Yarimada Madrasa Rose 2012 , 750 ml
Meyseri Mercan 2018 Orqanik 750 ml
Gurmania® Rose, 750 ml
Chabiant Vino Raro, 750 ml
Gurmania Saperavi Turş
Meyseri Innabi 2018 750 ml
Yarimada Cabernet Sauvignon 2016, 750 ml
Syrah by Gurmania 750 ml
Yarimada Muscat 2014, 750 ml
Meyseri Sedef Orqanik 750 ml
Meyseri Bulluri 2018 750 ml
Pomegranat by Gurmania 750 ml
Hillside Cuvee Qırmızı Turş, 750 ml
Hillside Nectar Red Desert , 750 ml
Hillside Chardonnay Ağ Kəmturş, 750 ml
Hillside Pomegranate, 750 ml
Hillside Sauvignon Blanc Ağ Turş, 750 ml
Spiritus Vini, Matrasa 2018 750 ml
Traminer by Gurmania 750 ml
Muscat by Gurmania 750 ml
Alazani Valley Qırmızı by Gurmania 750 ml
Alazani Valley Ağ by Gurmania 750 ml
Chardonnay by Gurmania 750 ml
Chabian Bayan Shira
Vine Ponto® Mtsvane, 750 ml
Vine Ponto® Кhikhvi, 750 ml
Vine Ponto® Rkatsiteli, 750 ml
Zigu
Shavnabada Monastery Wine 750ml
8000 Millennium, 750 ml
Okro Gold Kəmşirin, 750 ml
Marani Milorauli, Trio Turş Kəhrəba Şərabı, 750ml
Gogushika Kvevri Wine Kisi, 750 ml
Umano Tsinandali, 750 ml
Mtsvane White Kəmşirin, 750 ml
Usakhelauri®, 750 ml

##################

Another solution, using requests only and accessing the website api:

import requests
from bs4 import BeautifulSoup
import pandas as pd

r = requests.get('https://restaurant-api.wolt.com/v4/venues/slug/gurmania-winter-park/menu?unit_prices=true&show_weighted_items=true')
obj = r.json()['items']
df = pd.DataFrame(obj)
df

This returns a dataframe with 1609 rows × 31 columns:

    alcohol_percentage  allowed_delivery_methods    baseprice   category    checksum    description dietary_preferences disabled_info   enabled exclude_from_discounts  has_extra_info  id  image   image_blurhash  mandatory_warnings  max_quantity_per_purchase   name    no_contact_delivery_allowed options original_price  quantity_left   quantity_left_visible   restrictions    sell_by_weight_config   tags    times   type    unit_info   unit_price  validity    vat_percentage
0   120 [takeaway, homedelivery]    1360    000000000000000000000002    52ad71bc4371a195c9a6e91f26ab977f        []  None    True    False   False   62d145d3a67304b52f01e9c8    https://wolt-menu-images-cdn.wolt.com/menu-ima...   j1slBb4I008y2;cRXr8QYM4y;LTb    []  None    Hillside Blush Rose, 750 ml False   []  1700.0  None    False   [{'age_limit': None, 'type': 'alcohol'}]    None    []  [{'available_days_of_week': [1, 2, 3, 4, 5, 6,...   deal    None    None    {'end_date': 1658691900000, 'start_date': 1657...   18
1   120 [takeaway, homedelivery]    2240    000000000000000000000002    53718ff72c4d46f753b736c638b7a697        []  None    True    False   False   62d145d3a67304b52f01e9cb    https://wolt-menu-images-cdn.wolt.com/menu-ima...   j1nQQ;Lu8y0h0z8Q;:pl804yTtXK    []  None    Hillside Classico, 750 ml   False   []  2800.0  None    False   [{'age_limit': None, 'type': 'alcohol'}]    None    []  [{'available_days_of_week': [1, 2, 3, 4, 5, 6,...   deal    None    None    {'end_date': 1658691900000, 'start_date': 1657...   18
2   120 [takeaway, homedelivery]    1360    000000000000000000000002    9830017ad47cb0de5eaf58ceb9130c7f        []  None    True    False   False   62d145d3a67304b52f01e9cc    https://wolt-menu-images-cdn.wolt.com/menu-ima...   j1mA8PLu;K4i0Jh4XscQYw4igQXs    []  None    Hillside Saperavi, 750 ml   False   []  1700.0  None    False   [{'age_limit': None, 'type': 'alcohol'}]    None    []  [{'available_days_of_week': [1, 2, 3, 4, 5, 6,...   deal    None    None    {'end_date': 1658691900000, 'start_date': 1657...   18
  • Related