Home > front end >  Grubhub scraping: How to extract the full html
Grubhub scraping: How to extract the full html

Time:08-21

I am scraping grubhub and I am not able to scrape the full menu.

https://www.grubhub.com/restaurant/buca-di-beppo-1875-s-bascom-ave-campbell/335944

For example in the above,it only scrapes appitizers. Scrolling is required to get the rest, however the captcha realizes it is automated (with selenium) and I cannot scrape anymore.

Here is what I have:

driver.get(link)
time.sleep(2)
page = driver.page_source
soup = BeautifulSoup(page, 'html.parser')
dishes = soup.find_all('div', class_='menuItemNew-name')
descs = soup.find_all('div', class_='padding-y-2')
dishes_ = []
descs_ = []
for items in dishes:
    dishes_  = items.find_all(text=True)
for items in descs:
    descs_  = items.find_all(text=True)

print(dishes_)
print(descs_)

descs are the descirptions of each dish which I also want to scrape.

How do I get the full menu (and the google maps link at the very bottom of the page if possible?)

CodePudding user response:

To scrape the full menu the google maps link at the very bottom of the page you need to induce WebDriverWait for the visibility_of_element_located() and you can use the following locator strategy:

  • Code Block:

    options = Options()
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('excludeSwitches', ['enable-logging'])
    options.add_experimental_option('useAutomationExtension', False)
    options.add_argument('--disable-blink-features=AutomationControlled')
    s = Service('C:\\BrowserDrivers\\chromedriver.exe')
    driver = webdriver.Chrome(service=s, options=options)
    driver.get('https://www.grubhub.com/restaurant/buca-di-beppo-1875-s-bascom-ave-campbell/335944')
    WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a[data-testid='restaurant-about-google-map-link']")))
    print(driver.page_source)
    
  • Console Output:

    <html lang="en" ><head><script type="text/javascript" async="" charset="utf-8" id="utag_367" src="//d.impactradius-event.com/A1231534-f0ec-4c6c-b14f-75a55231a9591.js"></script><script src="https://ext.chtbl.com/trackable.js"></script><script type="text/javascript" async="" charset="utf-8" src="https://www.googletagmanager.com/gtag/js?id=G-7YX8989VK2" id="utag_628"></script><script type="text/javascript" async="" src="https://www.googletagmanager.com/gtag/destination?id=G-7YX8989VK2&amp;l=dataLayer&amp;cx=c"></script><script type="text/javascript" async="" src="https://www.googletagmanager.com/gtag/js?id=G-7YX8989VK2&amp;l=dataLayer&amp;cx=c"></script><script type="text/javascript" async="" src="https://www.googletagmanager.com/gtag/destination?id=DC-11687855&amp;l=dataLayer&amp;cx=c"></script><script type="text/javascript" async="" src="https://www.googletagmanager.com/gtag/js?id=DC-11687855&amp;l=dataLayer&amp;cx=c"></script><script type="text/javascript" async="" charset="utf-8" id="utag_577" src="//js.adsrvr.org/up_loader.1.1.0.js"></script><script type="text/javascript" async="" charset="utf-8" src="//analytics.tiktok.com/i18n/pixel/events.js?sdkid=undefinedttq" id="utag_568"></script><script type="text/javascript" async="" charset="utf-8" id="utag_550" src="//mi.grubhub.com/p/js/1.js"></script><script src="https://www.redditstatic.com/ads/pixel.js" async=""></script><script type="text/javascript" async="" charset="utf-8" src="https://pixel.mathtag.com/event/js?version=1.1&amp;delimiter=,&amp;industry=Internet Services&amp;event_type=catchall&amp;mt_id=1427886&amp;mt_pp=1&amp;mt_adid=227305" id="utag_430"></script><script async="" src="//px.airpr.com/airpr.js"></script><script type="text/javascript" async="" src="https://www.googletagmanager.com/gtag/js?id=AW-987205382&amp;l=dataLayer&amp;cx=c"></script><script async="" src="https://sc-static.net/scevent.min.js"></script><script type="text/javascript" async="" charset="utf-8" id="utag_566" src="https://connect.facebook.net/en_US/fbevents.js"></script><script type="text/javascript" defer="" async="" src="https://collector-21091.us.tvsquared.com/tv2track.js"></script><script type="text/javascript" async="" charset="utf-8" src="//bat.bing.com/bat.js" id="utag_171"></script><script type="text/javascript" async="" charset="utf-8" src="https://www.google-analytics.com/analytics.js" id="tealium-tag-7110"></script><script type="text/javascript" async="" src="https://www.google-analytics.com/plugins/ua/linkid.js"></script><script type="text/javascript" src="https://bam-cell.nr-data.net/1/5923691cbd?a=11156950&amp;sa=1&amp;v=1216.487a282&amp;t=Unnamed Transaction&amp;ct=https://www.grubhub.com/restaurant&amp;rst=2434&amp;ck=1&amp;ref=https://www.grubhub.com/restaurant/buca-di-beppo-1875-s-bascom-ave-campbell/335944&amp;be=541&amp;fe=2213&amp;dc=986&amp;af=err,xhr,stn,ins,spa&amp;perf={"timing":{"of":1661037628304,"n":0,"f":1,"dn":2,"dne":71,"c":71,"s":100,"ce":166,"rq":166,"rp":479,"rpe":560,"dl":485,"di":987,"ds":987,"de":987,"dc":2213,"l":2213,"le":2218},"navigation":{}}&amp;fp=826&amp;fcp=1572&amp;ja={"diner_type":"diner_unknown","umami_app_version":"4.2.3852","ab_testing_status":"optimize enabled","clickstream_browser_id":"dec60c6c-11f2-4a3f-9f08-18cc784d5682","ad_block_enabled":true,"is_spider_bot":false,"clickstream_session_id":"ae778399-20de-11ed-a9d5-23c0dcc7cb7b","first-paint":826.5,"first-contentful-paint":1572.7999999523163,"fetchStart":1,"domainLookupStart":2,"domainLookupEnd":71,"connectStart":71,"connectEnd":166,"secureConnectionStart":100,"requestStart":166,"responseStart":479,"responseEnd":560,"domLoading":485,"domInteractive":987,"domContentLoadedEventStart":987,"domContentLoadedEventEnd":987,"domComplete":2213,"loadEventStart":2213}&amp;jsonp=NREUM.setToken"></script><script src="https://js-agent.newrelic.com/nr-spa-1216.min.js"></script><script type="text/javascript" async="" src="https://www.google-analytics.com/gtm/js?id=GTM-58CKX3J&amp;t=teal_grubhublabs_UniversalproductionStandard&amp;cid=1361115206.1661037630"></script><script src="https://cdn.ravenjs.com/3.26.4/raven.min.js"></script><script src="https://assets.grubhub.com/assets/dll/load-uuid-740f2944b2a1abda6733.js"></script>
    
        <link rel="manifest" href="https://assets.grubhub.com/manifest.json">
    
    
        <link rel="search" type="application/opensearchdescription xml" title="Find food" href="/opensearch.xml">
    
    
        <meta http-equiv="X-UA-Compatible" content="IE=edge">
        <meta charset="utf-8">
    .
    <div ><cb-icon ><svg  aria-hidden="true"><use xlink:href="#clock-back"></use></svg></cb-icon><span ><span  data-testid="menu-item-price" itemprop="price">$39.60</span><span data-testid="menu-item-price-plus" > </span></span></div></div></button></article></div></div></div></div></div></div></span></div></div></span></div></div></div></div></div></div></main><a name="reviews"></a><div><div data-testid="restaurant-about-reviews-sections" ><div ><div ><div id="navSection-about"  tabindex="0"><span data-testid="restaurant-about" id="ghs-restaurant-about"><div ><h2 data-testid="restaurantAbout-header">Buca di Beppo Menu Info</h2><div ><div data-testid="restaurantAbout-cuisines" ><a data-testid="restaurantAbout-cuisines--Dinner"  href="/delivery/ca-campbell/dinner">Dinner,♂</a><a data-testid="restaurantAbout-cuisines--Lunch Specials"  href="/delivery/ca-campbell/lunch_specials">Lunch Specials,♂</a><a data-testid="restaurantAbout-cuisines--Pasta"  href="/delivery/ca-campbell/pasta">Pasta,♂</a><a data-testid="restaurantAbout-cuisines--Pizza"  href="/delivery/ca-campbell/pizza">Pizza</a></div><span ><div data-testid="restaurant-price-rating"  title="$$$"><div data-testid="restaurant-price-rating-base" >$$$$$</div><div data-testid="restaurant-price-rating-value"  itemprop="priceRange">$$$</div></div></span></div><div ><div ><a data-testid="restaurant-about-google-map-link" href="https://maps.google.com?daddr=1875 S Bascom Ave Campbell CA 95008" target="_blank" rel="noopener"><span data-testid="static-map" ></span></a><a target="_blank" rel="noopener" data-testid="restaurant-about-address" href="http://maps.google.com/maps?daddr=1875 S Bascom Ave, Campbell, CA, 95008" ><div>1875 S Bascom Ave</div>Campbell, CA 95008</a><div ><button data-testid="restaurant-phone-button" itemprop="telephone" content="4083777722" ><span >(408) 377-7722</span></button></div><a href="/food/buca_di_beppo" data-testid="restaurantAbout-chainUrl"><div ><span>View more about </span>Buca di Beppo</div></a></div><div  data-testid="restaurant-hours"><h5 >Hours</h5><div ><span data-testid="days0">Today</span><div ><div  data-testid="pickupHours00">Pickup: 10:30am–9:30pm</div><div  data-testid="deliveryHours00">Delivery: 10:30am–9:30pm</div></div></div><button data-testid="show-full-schedule-link" >See the full schedule</button></div></div></div></span><span data-testid="ghs-impression-tracker" style="width: 100%;"><div data-testid="taking-orders-carousel"><span data-testid="restaurant-section-data" type="sponsored" ><div data-testid="in-view" ><span ><ghs-restaurant-carousel><div ><span ><div data-testid="carousel" ><div ><span data-testid="carousel-scroll-wrapper" ></span></div></div></span></div></ghs-restaurant-carousel></span></div></span></div></span></div><span id="navSection-reviews"  data-testid="ghs-impression-tracker"><div id="ghs-restaurant-reviews"  data-testid="restaurant-reviews"><div data-testid="in-view" ><div ><div  data-testid="restaurantReviews-container" id="restaurantPage-reviewHighlights"><div ><div ><div data-testid="facet-header" ><h2> Reviews for Buca di Beppo</h2><div ><span data-testid="star-rating-id"><div  data-testid="starRating"><span  data-testid="stars"><div  data-testid="stars-static" style="background-position: 0px -168px;"></div></span><span data-testid="star-rating-text" >208 <span>ratings</span></span></div></span></div><div ><span data-testid="review-section-rating-facets"><div  data-testid="ratingfacets"><div  data-testid="ratingsfacet-details"><p  data-testid="ratingsfacet-header">Here's what people are saying:</p><ul data-testid="ratingsfacet-facetlist" ><li ><span >88</span> <span >Food was good</span></li><li ><span >79</span> <span >Delivery was on time</span></li><li ><span >88</span> <span >Order was accurate</span></li></ul></div></div></span></div></div><div ></div></div></div><div  data-testid="restaurantReviews-body" impressionid="reviewBodyId"><div ><div  data-testid="allReviews-sortBar"><span ></span></div></div></div><span></span></div></div></div></div></span><span data-testid="faqs"><div data-testid="faqs-container" ><div ><div ><div data-testid="faqs-heading" ><h2 >FAQs</h2></div><div data-testid="faqs-body-container" itemscope="" itemtype="http://schema.org/FAQPage"><div data-testid="faq-question"  itemprop="mainEntity" itemscope="" itemtype="http://schema.org/Question"><h6 itemprop="name"><span>Q) </span>Does Buca di Beppo (1875 S Bascom Ave) deliver?</h6><div  data-testid="faq-answer" itemprop="acceptedAnswer" itemscope="" itemtype="http://schema.org/Answer"><span>A) </span><span itemprop="text"><span data-testid="safe-html"><div xmlns="http://www.w3.org/1999/xhtml" id="safeHtmlWrapper0">Yes, Buca di Beppo (1875 S Bascom Ave) delivery is available on Grubhub.</div></span></span></div></div><div data-testid="faq-question"  itemprop="mainEntity" itemscope="" itemtype="http://schema.org/Question"><h6 itemprop="name"><span>Q) </span>Does Buca di Beppo (1875 S Bascom Ave) offer contact-free delivery?</h6><div  data-testid="faq-answer" itemprop="acceptedAnswer" itemscope="" itemtype="http://schema.org/Answer"><span>A) </span><span itemprop="text"><span data-testid="safe-html"><div xmlns="http://www.w3.org/1999/xhtml" id="safeHtmlWrapper1">Yes, Buca di Beppo (1875 S Bascom Ave) provides contact-free delivery with Grubhub.</div></span></span></div></div><div data-testid="faq-question"  itemprop="mainEntity" itemscope="" itemtype="http://schema.org/Question"><h6 itemprop="name"><span>Q) </span>What type of food is Buca di Beppo (1875 S Bascom Ave)?</h6><div  data-testid="faq-answer" itemprop="acceptedAnswer" itemscope="" itemtype="http://schema.org/Answer"><span>A) </span><span itemprop="text"><span data-testid="safe-html"><div xmlns="http://www.w3.org/1999/xhtml" id="safeHtmlWrapper2">Buca di Beppo (1875 S Bascom Ave) is a Italian restaurant.</div></span></span></div></div><div data-testid="faq-question"  itemprop="mainEntity" itemscope="" itemtype="http://schema.org/Question"><h6 itemprop="name"><span>Q) </span>Is Buca di Beppo (1875 S Bascom Ave) eligible for Grubhub  free delivery?</h6><div  data-testid="faq-answer" itemprop="acceptedAnswer" itemscope="" itemtype="http://schema.org/Answer"><span>A) </span><span itemprop="text"><span data-testid="safe-html"><div xmlns="http://www.w3.org/1999/xhtml" id="safeHtmlWrapper3">Yes, Grubhub offers free delivery for Buca di Beppo (1875 S Bascom Ave) with a <a href="https://www.grubhub.com/plus">Grubhub </a> membership.</div></span></span></div></div></div></div></div></div></span>
    .
    <script type="text/javascript" id="tealium-script" src="https://tags.tiqcdn.com/utag/grubhubseamless/grubhub/prod/utag.js"></script><div><span data-testid="popover-content" id="ghs-popover-content-0"><aside  role="tooltip" style="inset: -10000px auto auto;"><div ></div><div ><span data-testid="closed-bag-popover"  style="min-height: 150px; min-width: 300px;"><aside id="ghs-globalCart-container"><span><span data-testid="global-cart"><div data-testid="global-cart-body" id="global-cart"  tabindex="-1"><span data-testid="sev-one"></span><section ><div ><div ></div><div ><h5 >Your bag is empty.</h5></div></div></section></div></span></span></aside></span></div><span ></span></aside></span></div><script type="text/javascript" id="clickstream-tag" src="https://assets.grubhub.com/libs/clickstreamjs/2.0.21/clickstream2.min.js"></script><script type="text/javascript" id="perimeter-x-script-tag" src="https://sensor.grubhub.com/O97ybH4J/init.js"></script><script type="text/javascript" id="app-boy-script" src="//assets.grubhub.com/libs/appboy/1.6/appboy.min.js"></script><script type="text/javascript" id="inauth-script-tag" src="https://www.cdn-net.com/cc.js?ts=1661037629801"></script><div><span data-testid="popover-content" id="ghs-popover-content-1"><aside  role="tooltip" style="inset: -10000px auto auto;"><div ></div><div ><div ><span data-testid="review-section-rating-facets"><div  data-testid="ratingfacets"><div  data-testid="ratingsfacet-details"><p  data-testid="ratingsfacet-header">Here's what people are saying:</p><ul data-testid="ratingsfacet-facetlist" ><li ><span >88</span> <span >Food was good</span></li><li ><span >79</span> <span >Delivery was on time</span></li><li ><span >88</span> <span >Order was accurate</span></li></ul></div></div></span></div></div><span ></span></aside></span></div><script src="https://cdn.branch.io/branch-latest.min.js" id="branch loader" async="true"></script><div id="ttdUniversalPixelTag" style="display: none;"></div></body></html>
    
  • Related