Home > OS >  How do I use Selenium to get the menu items and prices for different restaurants by State?
How do I use Selenium to get the menu items and prices for different restaurants by State?

Time:04-15

It's my first time working with Selenium and web scraping. I have been trying to get the menu item and prices for a certain restaurant in California from the following website (enter image description here

CodePudding user response:

*The website is using cloudflare protection

https://www.fastfoodmenuprices.com/baskin-robbins-prices/ is using Cloudflare CDN/Proxy!

https://www.fastfoodmenuprices.com/baskin-robbins-prices/ is using Cloudflare SSL!

** So I have to use the following options to evade detection

options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')

*** To select table tr, td,I use css selector which is more robust and flexible.

**** I have to use list and zip function in pandas DataFrame as it shows not the same shape.

***** I have to use try except as you will see that some menu items are missing

Script:

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import pandas as pd
from bs4 import BeautifulSoup


options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')

driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)

url = "https://www.fastfoodmenuprices.com/baskin-robbins-prices/"
driver.get(url)

Select(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH, "//select[@class='tp-variation']")))).select_by_value("MS4yOA==")



price=[]
menu=[]
 
soup = BeautifulSoup (driver.page_source,"lxml")
driver.close()

for element in soup.select('#tablepress-34 tbody tr'):
    try:
        menus = element.select_one('td:nth-child(2)').text
        menu.append(menus)
    except:
        pass
    try:
        prices = element.select_one('td:nth-child(3) span').text
        price.append(prices)
    except:
        pass
   
 

df = pd.DataFrame(data=list(zip(price,menu)),columns=['price','menu'])
print(df)

Output:

    price      menu
0    $2.80     Mini
1    $4.84    Small
2    $5.61   Medium
3    $7.65    Large
4    $2.02     Kids
5    $2.53  Regular
6    $3.81    Large
7    $2.80     Mini
8    $6.39  Regular
9    $7.03
10   $7.03
11   $8.56
12   $7.67
13   $7.67
14   $7.67
15   $7.67
16   $4.47
17   $5.75
18   $6.64
19   $1.01
20   $1.27
21   $2.80
22   $3.57
23   $5.11
24   $1.27
25   $1.91
26   $1.91
27   $4.72     Mini
28   $6.00    Small
29   $7.28   Medium
30   $8.56    Large
31   $4.72     Mini
32   $6.00    Small
33   $7.28   Medium
34   $8.56    Large
35   $0.64
36   $4.72     Mini
37   $6.00    Small
38   $7.28   Medium
39   $8.56    Large
40   $4.72     Mini
41   $6.00    Small
42   $7.28   Medium
43   $8.56    Large
44   $7.67    Quart
45   $6.39     Pint
46  $10.23    Quart
47   $3.70
  • Related