It's my first time working with Selenium and web scraping. I have been trying to get the menu item and prices for a certain restaurant in California from the following website (
CodePudding user response:
*The website is using cloudflare protection
https://www.fastfoodmenuprices.com/baskin-robbins-prices/ is using Cloudflare CDN/Proxy!
https://www.fastfoodmenuprices.com/baskin-robbins-prices/ is using Cloudflare SSL!
** So I have to use the following options to evade detection
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
*** To select table tr, td
,I use css selector which is more robust and flexible.
**** I have to use list and zip
function in pandas DataFrame as it shows not the same shape.
***** I have to use try except as you will see that some menu items are missing
Script:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import pandas as pd
from bs4 import BeautifulSoup
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)
url = "https://www.fastfoodmenuprices.com/baskin-robbins-prices/"
driver.get(url)
Select(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH, "//select[@class='tp-variation']")))).select_by_value("MS4yOA==")
price=[]
menu=[]
soup = BeautifulSoup (driver.page_source,"lxml")
driver.close()
for element in soup.select('#tablepress-34 tbody tr'):
try:
menus = element.select_one('td:nth-child(2)').text
menu.append(menus)
except:
pass
try:
prices = element.select_one('td:nth-child(3) span').text
price.append(prices)
except:
pass
df = pd.DataFrame(data=list(zip(price,menu)),columns=['price','menu'])
print(df)
Output:
price menu
0 $2.80 Mini
1 $4.84 Small
2 $5.61 Medium
3 $7.65 Large
4 $2.02 Kids
5 $2.53 Regular
6 $3.81 Large
7 $2.80 Mini
8 $6.39 Regular
9 $7.03
10 $7.03
11 $8.56
12 $7.67
13 $7.67
14 $7.67
15 $7.67
16 $4.47
17 $5.75
18 $6.64
19 $1.01
20 $1.27
21 $2.80
22 $3.57
23 $5.11
24 $1.27
25 $1.91
26 $1.91
27 $4.72 Mini
28 $6.00 Small
29 $7.28 Medium
30 $8.56 Large
31 $4.72 Mini
32 $6.00 Small
33 $7.28 Medium
34 $8.56 Large
35 $0.64
36 $4.72 Mini
37 $6.00 Small
38 $7.28 Medium
39 $8.56 Large
40 $4.72 Mini
41 $6.00 Small
42 $7.28 Medium
43 $8.56 Large
44 $7.67 Quart
45 $6.39 Pint
46 $10.23 Quart
47 $3.70