get text from a div with class and itemprop-CodePudding

I am trying to extract the text from this:

[<div  itemprop="name">Beno's Flowers &amp; Gifts</div>, <div 
 itemprop="name">Bluebird Diner</div>, <div 
 itemprop="name">Bread Garden Market</div>]

This is my code:

import requests
from bs4 import BeautifulSoup

 url = 'https://www.chomp.delivery/restaurants'

 headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
       'AppleWebKit/537.36 (KHTML, like Gecko) '\
       'Chrome/75.0.3770.80 Safari/537.36'}

 response = requests.get(url,headers=headers)
 soup = BeautifulSoup(response.text, "html.parser")

 restaurant_wrapper = soup.find(class_ = "dd_rest_list")
 restaurants = restaurant_wrapper.find_all(class_="menu__vendor-name", 
 itemprop="name")
 
 def extract_restaurant_data(restaurant):
   results = [
    {
        "title": print(title.text.strip())
    }
    for title in restaurant_details
    ]

  print(results)

  results = [extract_restaurant_data(restaurant) for restaurant in restaurants]

Output:

 AttributeError: 'tuple' object has no attribute 'text'

I am thinking that the issue is that each div has an itemprop, maybe this is the issue.

CodePudding user response：

Assuming your goal is to scrape some details from each restaurant and not only its name. Change your strategy - process the data in same way you will read it and store it more structured in a list of dicts:

results = []

for restaurant in soup.select('.dd_rest_list a'):
    results.append({
        'title':restaurant.find('div',{'itemprop':'name'}).text,
        'url':'https://www.chomp.delivery' restaurant.get('href'),
        'address':restaurant.find('div',{'itemprop':'address'}).get_text(',',strip=True) if restaurant.find('div',{'itemprop':'address'}) else None,
        'and':'so on'
    })

Always check if element you like to select exists before calling a methode:

'address':restaurant.find('div',{'itemprop':'address'}).get_text(',',strip=True) if restaurant.find('div',{'itemprop':'address'}) else None,

Example

import requests
from bs4 import BeautifulSoup

url = 'https://www.chomp.delivery/restaurants'

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
   'AppleWebKit/537.36 (KHTML, like Gecko) '\
   'Chrome/75.0.3770.80 Safari/537.36'}

response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text)

results = []

for restaurant in soup.select('.dd_rest_list a'):
    results.append({
        'title':restaurant.find('div',{'itemprop':'name'}).text,
        'url':'https://www.chomp.delivery' restaurant.get('href'),
        'address':restaurant.find('div',{'itemprop':'address'}).get_text(',',strip=True) if restaurant.find('div',{'itemprop':'address'}) else None,
        'and':'so on'
    })
results

Output

[{'title': '2 Dogs Pub',
  'url': 'https://www.chomp.delivery/r/21/restaurants/delivery/Burgers/2-Dogs-Pub-Iowa-City',
  'address': '1705 S 1st Ave Ste Q,Iowa City,IA,52240',
  'and': 'so on'},
 {'title': 'Alebrije Mexican Restaurant',
  'url': 'https://www.chomp.delivery/r/3316/restaurants/delivery/Mexican/Alebrije-Mexican-Restaurant-Iowa-City',
  'address': '401 S Linn st,Iowa City,IA,52240',
  'and': 'so on'},
 {'title': 'Ascended Electronics',
  'url': 'https://www.chomp.delivery/r/2521/restaurants/delivery/Retail/Ascended-Electronics-Iowa-City',
  'address': '208 Stevens Dr,Iowa City,IA,52240',
  'and': 'so on'},
 {'title': 'Aspen Leaf Frozen Yogurt',
  'url': 'https://www.chomp.delivery/r/522/restaurants/delivery/Ice-Cream-Sweets-Snacks/Aspen-Leaf-Frozen-Yogurt-Iowa-City',
  'address': '125 S Dubuque St,Iowa City,IA,52240',
  'and': 'so on'},...]

CodePudding user response：

You can get text meaning title from each div[itemprop="name"]' with the help of css selector along with more flexible way.

import requests
from bs4 import BeautifulSoup

url = 'https://www.chomp.delivery/restaurants'

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '
       'AppleWebKit/537.36 (KHTML, like Gecko) '
       'Chrome/75.0.3770.80 Safari/537.36'}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")

restaurants = soup.select('.dd_restlink')
for restaurant in restaurants:
    title=restaurant.select_one('div[itemprop="name"]').text
    print(title)

Output:

2 Dogs Pub
Alebrije Mexican Restaurant
Beno's Flowers & Gifts     
Blackstone
Bluebird Diner
Bo James
Bread Garden Market        
Carlos O'Kelly's
Cookies and More
Crepes de luxe cafe        
Deli Mart

.. so on

I think address is in div[itemprop="address"]

import requests
from bs4 import BeautifulSoup

url = 'https://www.chomp.delivery/restaurants'

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '
       'AppleWebKit/537.36 (KHTML, like Gecko) '
       'Chrome/75.0.3770.80 Safari/537.36'}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")

restaurants = soup.select('.dd_restlink')
for restaurant in restaurants:
    title=restaurant.select_one('div[itemprop="name"]').text
    #print(title)
    try:
        address=restaurant.select_one('div[itemprop="address"]').text
        print(address)
    except:
        pass

Output:

1705 S 1st Ave Ste QIowa CityIA52240


401 S Linn stIowa CityIA52240   


208 Stevens DrIowa CityIA52240  


125 S Dubuque StIowa CityIA52240


107 E Iowa Ave.Iowa CityIA52240


503 Westbury DrIowa CityIA52245


330 E Market St.Iowa CityIA52245


118 E Washington St.Iowa CityIA52240


225 S Linn St.Iowa CityIA52240


1406 S GilbertIowa CityIA52240


201 S Clinton StreetIowa CityIA52240


309 East College StreetIowa CityIA52240


206 E Benton St.Iowa CityIA52240


110 E College St.Iowa CityIA52240


519 east washingtonIowa CityIA52240


1705 S 1st AveIowa CityIA52240


1534 S Gilbert StIowa CityIA52240


457 S Gilbert StIowa CityIA52240


201 S Clinton St Suite 146Iowa CityIA52240


109 Iowa Ave.Iowa CityIA52240


220 Lafayette StIowa CityIA52240


214 N Linn St.Iowa CityIA52245


211 E Washington St.Iowa CityIA52240


717 Mormon Trek Blvd.Iowa CityIA52246


482 Hwy 1 WIowa CityIA52246


230 E Benton StreetIowa CityIA52240


227 E Washington StIowa CityIA52240


1575 S 1st AveIowa CityIA52240


223 S Gilbert St.Iowa CityIA52240


224 S Linn StIowa CityIA52240


9 S Dubuque St.Iowa CityIA52240


114 East Washington StIowa CityIA52240


11 S Dubuque St.Iowa CityIA52240


14 S Clinton St.Iowa CityIA52240


22 S Van Buren StIowa CityIA52240


5 S Dubuque St.Iowa CityIA52240


206 N. Linn StIowa CityIA52240


5 Sturgis Corner Dr.Iowa CityIA52246