While attempting to scrape this website: https://dining.umich.edu/menus-locations/dining-halls/mosher-jordan/ I have located the food item names by doing the following:
import requests
from bs4 import BeautifulSoup
url = "https://dining.umich.edu/menus-locations/dining-halls/mosher-jordan/"
req = requests.get(url, headers)
soup = BeautifulSoup(req.content, 'html.parser')
foodLocation = soup.find_all('div', class_='item-name')
for singleFood in foodLocation:
food = singleFood.text
print(food)
The problem is, I only want to print the food inside of the "World Palate Maize" section seen in the Lunch portion of the link. In the HTML, there are multiple divs that all contain the foods within a certain type (World Palate Maize, Hot Cereal, MBakery etc.) I'm having trouble figuring out how to tell the loop to only print inside of a certain section (certain div?). This may require an if statement or condition in the for loop but I am unsure about how to format/what to use as a condition to ensure this loop only prints the content from one section.
CodePudding user response:
Seems like "Lunch" would always be the second div, so you can probably do
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla'
}
url = "https://dining.umich.edu/menus-locations/dining-halls/mosher-jordan/"
req = requests.get(url, headers)
soup = BeautifulSoup(req.content, 'html.parser')
[breakfast, lunch, dinner] = soup.select('div#mdining-items div.courses')
foods = lunch.select('div.item-name')
for food in foods:
print(food.text)
CodePudding user response:
The desired data that the url contains which is dynamic meaning data is generated by JavaScript and BeautifulSoup can't render javaSceipt.So, You need automation tool something like selenium with BeautifulSoup. Please just run the code.
Script:
from bs4 import BeautifulSoup
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
url = "https://dining.umich.edu/menus-locations/dining-halls/mosher-jordan/"
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
driver.get(url)
time.sleep(5)
soup = BeautifulSoup(driver.page_source, 'html.parser')
#driver.close()
items =soup.select('div.courses > ul > li > ul')
for item in items[3:]:
lunch_item= item.select_one('.item-name').text
print(lunch_item)
Output:
Cream of Potato Soup
Baked Scallops
Mojo Grilled Chicken
Tofu Banh Mi Sandwich
Cheese Pizza
Italian Turkey Burger
Chocolate Chunk Cookies
Cream of Potato Soup
Texas Style Beef Brisket
Grilled Halal Honey Lime Chicken
Korean Tofu Power Bowl
Pepperoni Pizza
Italian Turkey Burger
Mississippi Mud Cake