Home > Net >  get text from a div with class and itemprop
get text from a div with class and itemprop

Time:04-13

I am trying to extract the text from this:

[<div  itemprop="name">Beno's Flowers &amp; Gifts</div>, <div 
 itemprop="name">Bluebird Diner</div>, <div 
 itemprop="name">Bread Garden Market</div>]

This is my code:

import requests
from bs4 import BeautifulSoup

 url = 'https://www.chomp.delivery/restaurants'

 headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
       'AppleWebKit/537.36 (KHTML, like Gecko) '\
       'Chrome/75.0.3770.80 Safari/537.36'}

 response = requests.get(url,headers=headers)
 soup = BeautifulSoup(response.text, "html.parser")

 restaurant_wrapper = soup.find(class_ = "dd_rest_list")
 restaurants = restaurant_wrapper.find_all(class_="menu__vendor-name", 
 itemprop="name")
 
 def extract_restaurant_data(restaurant):
   results = [
    {
        "title": print(title.text.strip())
    }
    for title in restaurant_details
    ]

  print(results)

  results = [extract_restaurant_data(restaurant) for restaurant in restaurants]

Output:

 AttributeError: 'tuple' object has no attribute 'text'

I am thinking that the issue is that each div has an itemprop, maybe this is the issue.

CodePudding user response:

Assuming your goal is to scrape some details from each restaurant and not only its name. Change your strategy - process the data in same way you will read it and store it more structured in a list of dicts:

results = []

for restaurant in soup.select('.dd_rest_list a'):
    results.append({
        'title':restaurant.find('div',{'itemprop':'name'}).text,
        'url':'https://www.chomp.delivery' restaurant.get('href'),
        'address':restaurant.find('div',{'itemprop':'address'}).get_text(',',strip=True) if restaurant.find('div',{'itemprop':'address'}) else None,
        'and':'so on'
    })

Always check if element you like to select exists before calling a methode:

'address':restaurant.find('div',{'itemprop':'address'}).get_text(',',strip=True) if restaurant.find('div',{'itemprop':'address'}) else None,
Example
import requests
from bs4 import BeautifulSoup

url = 'https://www.chomp.delivery/restaurants'

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
   'AppleWebKit/537.36 (KHTML, like Gecko) '\
   'Chrome/75.0.3770.80 Safari/537.36'}

response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text)

results = []

for restaurant in soup.select('.dd_rest_list a'):
    results.append({
        'title':restaurant.find('div',{'itemprop':'name'}).text,
        'url':'https://www.chomp.delivery' restaurant.get('href'),
        'address':restaurant.find('div',{'itemprop':'address'}).get_text(',',strip=True) if restaurant.find('div',{'itemprop':'address'}) else None,
        'and':'so on'
    })
results

Output

[{'title': '2 Dogs Pub',
  'url': 'https://www.chomp.delivery/r/21/restaurants/delivery/Burgers/2-Dogs-Pub-Iowa-City',
  'address': '1705 S 1st Ave Ste Q,Iowa City,IA,52240',
  'and': 'so on'},
 {'title': 'Alebrije Mexican Restaurant',
  'url': 'https://www.chomp.delivery/r/3316/restaurants/delivery/Mexican/Alebrije-Mexican-Restaurant-Iowa-City',
  'address': '401 S Linn st,Iowa City,IA,52240',
  'and': 'so on'},
 {'title': 'Ascended Electronics',
  'url': 'https://www.chomp.delivery/r/2521/restaurants/delivery/Retail/Ascended-Electronics-Iowa-City',
  'address': '208 Stevens Dr,Iowa City,IA,52240',
  'and': 'so on'},
 {'title': 'Aspen Leaf Frozen Yogurt',
  'url': 'https://www.chomp.delivery/r/522/restaurants/delivery/Ice-Cream-Sweets-Snacks/Aspen-Leaf-Frozen-Yogurt-Iowa-City',
  'address': '125 S Dubuque St,Iowa City,IA,52240',
  'and': 'so on'},...]

CodePudding user response:

You can get text meaning title from each div[itemprop="name"]' with the help of css selector along with more flexible way.

import requests
from bs4 import BeautifulSoup

url = 'https://www.chomp.delivery/restaurants'

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '
       'AppleWebKit/537.36 (KHTML, like Gecko) '
       'Chrome/75.0.3770.80 Safari/537.36'}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")

restaurants = soup.select('.dd_restlink')
for restaurant in restaurants:
    title=restaurant.select_one('div[itemprop="name"]').text
    print(title)

Output:

2 Dogs Pub
Alebrije Mexican Restaurant
Beno's Flowers & Gifts     
Blackstone
Bluebird Diner
Bo James
Bread Garden Market        
Carlos O'Kelly's
Cookies and More
Crepes de luxe cafe        
Deli Mart

.. so on

I think address is in div[itemprop="address"]

import requests
from bs4 import BeautifulSoup

url = 'https://www.chomp.delivery/restaurants'

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '
       'AppleWebKit/537.36 (KHTML, like Gecko) '
       'Chrome/75.0.3770.80 Safari/537.36'}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")

restaurants = soup.select('.dd_restlink')
for restaurant in restaurants:
    title=restaurant.select_one('div[itemprop="name"]').text
    #print(title)
    try:
        address=restaurant.select_one('div[itemprop="address"]').text
        print(address)
    except:
        pass
   

Output:

1705 S 1st Ave Ste QIowa CityIA52240


401 S Linn stIowa CityIA52240   


208 Stevens DrIowa CityIA52240  


125 S Dubuque StIowa CityIA52240


107 E Iowa Ave.Iowa CityIA52240


503 Westbury DrIowa CityIA52245


330 E Market St.Iowa CityIA52245


118 E Washington St.Iowa CityIA52240


225 S Linn St.Iowa CityIA52240


1406 S GilbertIowa CityIA52240


201 S Clinton StreetIowa CityIA52240


309 East College StreetIowa CityIA52240


206 E Benton St.Iowa CityIA52240


110 E College St.Iowa CityIA52240


519 east washingtonIowa CityIA52240


1705 S 1st AveIowa CityIA52240


1534 S Gilbert StIowa CityIA52240


457 S Gilbert StIowa CityIA52240


201 S Clinton St Suite 146Iowa CityIA52240


109 Iowa Ave.Iowa CityIA52240


220 Lafayette StIowa CityIA52240


214 N Linn St.Iowa CityIA52245


211 E Washington St.Iowa CityIA52240


717 Mormon Trek Blvd.Iowa CityIA52246


482 Hwy 1 WIowa CityIA52246


230 E Benton StreetIowa CityIA52240


227 E Washington StIowa CityIA52240


1575 S 1st AveIowa CityIA52240


223 S Gilbert St.Iowa CityIA52240


224 S Linn StIowa CityIA52240


9 S Dubuque St.Iowa CityIA52240


114 East Washington StIowa CityIA52240


11 S Dubuque St.Iowa CityIA52240


14 S Clinton St.Iowa CityIA52240


22 S Van Buren StIowa CityIA52240


5 S Dubuque St.Iowa CityIA52240


206 N. Linn StIowa CityIA52240


5 Sturgis Corner Dr.Iowa CityIA52246      
  • Related