Trying to scrape a website with restaurants information.
page does not seem to be fully loaded with missing content.
Tried different tools like selenium and BeautifulSoup, same issue.
process so far.
SCRAPY:
fetch('https://www.talabat.com/egypt/restaurant/643637/dominos-pizza-kafr-abdo?aid=7123')
response.css('div.accordionstyle__AccordionContainer-h3jkuk-0 gnYKPd')
[]
Beautiful Soup:
r = requests.get(f'https://www.talabat.com/egypt/restaurant/643637/dominos-pizza-kafr-abdo?aid=7123')
soup = bs(r.content)
resturant_cats = soup.find(class_='mt-2')
resturant_items = soup.find_all(class_='content open')
resturantmenu = pd.DataFrame(columns = ['ItemName','ItemCat','ItemPrice','ItemDesc'],index = range(len(resturant_items)))
# resturant_cats[1].find(class_='f-15').get_text()
print(resturant_items)
They Both return missing the target data.
Upon further inspection, the loaded source returns this HTML
<div >
<div >
<input placeholder="Search menu item" type="text" value=""/>
<svg aria-hidden="true" data-icon="search" data-prefix="fas" focusable="false" role="img" viewbox="0 0 512 512" xmlns="http://www.w3.org/2000/svg">
<path d="M505 442.7L405.3 343c-4.5-4.5-10.6-7-17-7H372c27.6-35.3 44-79.7 44-128C416 93.1 322.9 0 208 0S0 93.1 0 208s93.1 208 208 208c48.3 0 92.7-16.4 128-44v16.3c0 6.4 2.5 12.5 7 17l99.7 99.7c9.4 9.4 24.6 9.4 33.9 0l28.3-28.3c9.4-9.4 9.4-24.6.1-34zM208 336c-70.7 0-128-57.2-128-128 0-70.7 57.2-128 128-128 70.7 0 128 57.2 128 128 0 70.7-57.2 128-128 128z" fill="currentColor">
</path>
CodePudding user response:
from bs4 import BeautifulSoup
import requests
import json
import pandas as pd
def main(url):
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
goal = json.loads(soup.select_one('#__NEXT_DATA__').text)
df = pd.DataFrame(goal['props']['pageProps']
['initialMenuState']['menuData']['items'])
print(df)
main('https://www.talabat.com/egypt/restaurant/643637/dominos-pizza-kafr-abdo?aid=7123')
Output:
id name ... isItemDiscount isWithImage
0 770227845 Meal for 4 ... False False
1 770227846 Meal for 6 ... False False
2 770227847 Barbecue Feast ... False False
3 770227848 Deluxe Feast ... False False
4 770227849 Margherita ... False False
.. ... ... ... ... ...
81 770227928 RANCH SAUCE ... False False
82 770227929 Sugar Glaze ... False False
83 770227932 Cinna Stix 8 Pieces ... False False
84 770227933 Pineapple Cinna Stix 8 Pieces ... False False
85 770227934 Chocolate Lava Souffle ... False False
[86 rows x 14 columns]