I am trying to extract the text from this:
[<div itemprop="name">Beno's Flowers & Gifts</div>, <div
itemprop="name">Bluebird Diner</div>, <div
itemprop="name">Bread Garden Market</div>]
This is my code:
import requests
from bs4 import BeautifulSoup
url = 'https://www.chomp.delivery/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
'AppleWebKit/537.36 (KHTML, like Gecko) '\
'Chrome/75.0.3770.80 Safari/537.36'}
response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
restaurant_wrapper = soup.find(class_ = "dd_rest_list")
restaurants = restaurant_wrapper.find_all(class_="menu__vendor-name",
itemprop="name")
def extract_restaurant_data(restaurant):
results = [
{
"title": print(title.text.strip())
}
for title in restaurant_details
]
print(results)
results = [extract_restaurant_data(restaurant) for restaurant in restaurants]
Output:
AttributeError: 'tuple' object has no attribute 'text'
I am thinking that the issue is that each div has an itemprop, maybe this is the issue.
CodePudding user response:
Assuming your goal is to scrape some details from each restaurant and not only its name. Change your strategy - process the data in same way you will read it and store it more structured in a list
of dicts
:
results = []
for restaurant in soup.select('.dd_rest_list a'):
results.append({
'title':restaurant.find('div',{'itemprop':'name'}).text,
'url':'https://www.chomp.delivery' restaurant.get('href'),
'address':restaurant.find('div',{'itemprop':'address'}).get_text(',',strip=True) if restaurant.find('div',{'itemprop':'address'}) else None,
'and':'so on'
})
Always check if element you like to select exists before calling a methode:
'address':restaurant.find('div',{'itemprop':'address'}).get_text(',',strip=True) if restaurant.find('div',{'itemprop':'address'}) else None,
Example
import requests
from bs4 import BeautifulSoup
url = 'https://www.chomp.delivery/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
'AppleWebKit/537.36 (KHTML, like Gecko) '\
'Chrome/75.0.3770.80 Safari/537.36'}
response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text)
results = []
for restaurant in soup.select('.dd_rest_list a'):
results.append({
'title':restaurant.find('div',{'itemprop':'name'}).text,
'url':'https://www.chomp.delivery' restaurant.get('href'),
'address':restaurant.find('div',{'itemprop':'address'}).get_text(',',strip=True) if restaurant.find('div',{'itemprop':'address'}) else None,
'and':'so on'
})
results
Output
[{'title': '2 Dogs Pub',
'url': 'https://www.chomp.delivery/r/21/restaurants/delivery/Burgers/2-Dogs-Pub-Iowa-City',
'address': '1705 S 1st Ave Ste Q,Iowa City,IA,52240',
'and': 'so on'},
{'title': 'Alebrije Mexican Restaurant',
'url': 'https://www.chomp.delivery/r/3316/restaurants/delivery/Mexican/Alebrije-Mexican-Restaurant-Iowa-City',
'address': '401 S Linn st,Iowa City,IA,52240',
'and': 'so on'},
{'title': 'Ascended Electronics',
'url': 'https://www.chomp.delivery/r/2521/restaurants/delivery/Retail/Ascended-Electronics-Iowa-City',
'address': '208 Stevens Dr,Iowa City,IA,52240',
'and': 'so on'},
{'title': 'Aspen Leaf Frozen Yogurt',
'url': 'https://www.chomp.delivery/r/522/restaurants/delivery/Ice-Cream-Sweets-Snacks/Aspen-Leaf-Frozen-Yogurt-Iowa-City',
'address': '125 S Dubuque St,Iowa City,IA,52240',
'and': 'so on'},...]
CodePudding user response:
You can get text meaning title
from each div[itemprop="name"]'
with the help of css selector along with more flexible way.
import requests
from bs4 import BeautifulSoup
url = 'https://www.chomp.delivery/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/75.0.3770.80 Safari/537.36'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
restaurants = soup.select('.dd_restlink')
for restaurant in restaurants:
title=restaurant.select_one('div[itemprop="name"]').text
print(title)
Output:
2 Dogs Pub
Alebrije Mexican Restaurant
Beno's Flowers & Gifts
Blackstone
Bluebird Diner
Bo James
Bread Garden Market
Carlos O'Kelly's
Cookies and More
Crepes de luxe cafe
Deli Mart
.. so on
I think address is in div[itemprop="address"]
import requests
from bs4 import BeautifulSoup
url = 'https://www.chomp.delivery/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/75.0.3770.80 Safari/537.36'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
restaurants = soup.select('.dd_restlink')
for restaurant in restaurants:
title=restaurant.select_one('div[itemprop="name"]').text
#print(title)
try:
address=restaurant.select_one('div[itemprop="address"]').text
print(address)
except:
pass
Output:
1705 S 1st Ave Ste QIowa CityIA52240
401 S Linn stIowa CityIA52240
208 Stevens DrIowa CityIA52240
125 S Dubuque StIowa CityIA52240
107 E Iowa Ave.Iowa CityIA52240
503 Westbury DrIowa CityIA52245
330 E Market St.Iowa CityIA52245
118 E Washington St.Iowa CityIA52240
225 S Linn St.Iowa CityIA52240
1406 S GilbertIowa CityIA52240
201 S Clinton StreetIowa CityIA52240
309 East College StreetIowa CityIA52240
206 E Benton St.Iowa CityIA52240
110 E College St.Iowa CityIA52240
519 east washingtonIowa CityIA52240
1705 S 1st AveIowa CityIA52240
1534 S Gilbert StIowa CityIA52240
457 S Gilbert StIowa CityIA52240
201 S Clinton St Suite 146Iowa CityIA52240
109 Iowa Ave.Iowa CityIA52240
220 Lafayette StIowa CityIA52240
214 N Linn St.Iowa CityIA52245
211 E Washington St.Iowa CityIA52240
717 Mormon Trek Blvd.Iowa CityIA52246
482 Hwy 1 WIowa CityIA52246
230 E Benton StreetIowa CityIA52240
227 E Washington StIowa CityIA52240
1575 S 1st AveIowa CityIA52240
223 S Gilbert St.Iowa CityIA52240
224 S Linn StIowa CityIA52240
9 S Dubuque St.Iowa CityIA52240
114 East Washington StIowa CityIA52240
11 S Dubuque St.Iowa CityIA52240
14 S Clinton St.Iowa CityIA52240
22 S Van Buren StIowa CityIA52240
5 S Dubuque St.Iowa CityIA52240
206 N. Linn StIowa CityIA52240
5 Sturgis Corner Dr.Iowa CityIA52246