Home > front end >  Conditions in loop to ensure python only scrapes single div
Conditions in loop to ensure python only scrapes single div

Time:03-17

While attempting to scrape this website: https://dining.umich.edu/menus-locations/dining-halls/mosher-jordan/ I have located the food item names by doing the following:

import requests
from bs4 import BeautifulSoup

url = "https://dining.umich.edu/menus-locations/dining-halls/mosher-jordan/"
req = requests.get(url, headers)
soup = BeautifulSoup(req.content, 'html.parser')

foodLocation = soup.find_all('div', class_='item-name')

for singleFood in foodLocation:
    food = singleFood.text
    print(food)

The problem is, I only want to print the food inside of the "World Palate Maize" section seen in the Lunch portion of the link. In the HTML, there are multiple divs that all contain the foods within a certain type (World Palate Maize, Hot Cereal, MBakery etc.) I'm having trouble figuring out how to tell the loop to only print inside of a certain section (certain div?). This may require an if statement or condition in the for loop but I am unsure about how to format/what to use as a condition to ensure this loop only prints the content from one section.

CodePudding user response:

Seems like "Lunch" would always be the second div, so you can probably do

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla'
}

url = "https://dining.umich.edu/menus-locations/dining-halls/mosher-jordan/"
req = requests.get(url, headers)
soup = BeautifulSoup(req.content, 'html.parser')

[breakfast, lunch, dinner] = soup.select('div#mdining-items div.courses')
foods = lunch.select('div.item-name')

for food in foods:
    print(food.text)

CodePudding user response:

The desired data that the url contains which is dynamic meaning data is generated by JavaScript and BeautifulSoup can't render javaSceipt.So, You need automation tool something like selenium with BeautifulSoup. Please just run the code.

Script:

from bs4 import BeautifulSoup
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

url = "https://dining.umich.edu/menus-locations/dining-halls/mosher-jordan/"
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
driver.get(url)
time.sleep(5)

soup = BeautifulSoup(driver.page_source, 'html.parser')
#driver.close()

items =soup.select('div.courses > ul > li > ul')
for item in items[3:]:
    lunch_item= item.select_one('.item-name').text
    print(lunch_item)

Output:

Cream of Potato Soup 
Baked Scallops
Mojo Grilled Chicken
Tofu Banh Mi Sandwich
Cheese Pizza
Italian Turkey Burger
Chocolate Chunk Cookies 
Cream of Potato Soup
Texas Style Beef Brisket
Grilled Halal Honey Lime Chicken
Korean Tofu Power Bowl
Pepperoni Pizza
Italian Turkey Burger
Mississippi Mud Cake
    
  • Related