Home > other >  Webscraping Using Scrapy webpage doesn't fully load
Webscraping Using Scrapy webpage doesn't fully load

Time:01-03

Trying to scrape a website with restaurants information.

page does not seem to be fully loaded with missing content.

Tried different tools like selenium and BeautifulSoup, same issue.

process so far.

SCRAPY:

fetch('https://www.talabat.com/egypt/restaurant/643637/dominos-pizza-kafr-abdo?aid=7123')
response.css('div.accordionstyle__AccordionContainer-h3jkuk-0 gnYKPd')
[]

Beautiful Soup:

r = requests.get(f'https://www.talabat.com/egypt/restaurant/643637/dominos-pizza-kafr-abdo?aid=7123')
soup = bs(r.content)


resturant_cats = soup.find(class_='mt-2')
resturant_items = soup.find_all(class_='content open')
resturantmenu = pd.DataFrame(columns = ['ItemName','ItemCat','ItemPrice','ItemDesc'],index = range(len(resturant_items)))
# resturant_cats[1].find(class_='f-15').get_text()

print(resturant_items)

They Both return missing the target data.

Upon further inspection, the loaded source returns this HTML

<div >
              <div >
               <input  placeholder="Search menu item" type="text" value=""/>
               <svg aria-hidden="true"  data-icon="search" data-prefix="fas" focusable="false" role="img" viewbox="0 0 512 512" xmlns="http://www.w3.org/2000/svg">
                <path d="M505 442.7L405.3 343c-4.5-4.5-10.6-7-17-7H372c27.6-35.3 44-79.7 44-128C416 93.1 322.9 0 208 0S0 93.1 0 208s93.1 208 208 208c48.3 0 92.7-16.4 128-44v16.3c0 6.4 2.5 12.5 7 17l99.7 99.7c9.4 9.4 24.6 9.4 33.9 0l28.3-28.3c9.4-9.4 9.4-24.6.1-34zM208 336c-70.7 0-128-57.2-128-128 0-70.7 57.2-128 128-128 70.7 0 128 57.2 128 128 0 70.7-57.2 128-128 128z" fill="currentColor">
                </path>

CodePudding user response:

from bs4 import BeautifulSoup
import requests
import json
import pandas as pd


def main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'lxml')
    goal = json.loads(soup.select_one('#__NEXT_DATA__').text)
    df = pd.DataFrame(goal['props']['pageProps']
                      ['initialMenuState']['menuData']['items'])
    print(df)


main('https://www.talabat.com/egypt/restaurant/643637/dominos-pizza-kafr-abdo?aid=7123')

Output:

           id                           name  ... isItemDiscount  isWithImage
0   770227845                     Meal for 4  ...          False        False
1   770227846                     Meal for 6  ...          False        False
2   770227847                 Barbecue Feast  ...          False        False
3   770227848                   Deluxe Feast  ...          False        False
4   770227849                     Margherita  ...          False        False
..        ...                            ...  ...            ...          ...
81  770227928                    RANCH SAUCE  ...          False        False
82  770227929                    Sugar Glaze  ...          False        False
83  770227932            Cinna Stix 8 Pieces  ...          False        False
84  770227933  Pineapple Cinna Stix 8 Pieces  ...          False        False
85  770227934         Chocolate Lava Souffle  ...          False        False

[86 rows x 14 columns]
  • Related