Whenever i try to extract the data, it returns an output of "None" which I am not sure of is it the code (I followed the rules of using bs4) or is it just the website that's different to scrape?
My code:
import requests
import bs4 as bs
url = 'https://www.zomato.com/jakarta/pondok-indah-restaurants'
req = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
html = req.text
soup = bs.BeautifulSoup(html, "html.parser")
listings = soup.find('div', class_='sc-gAmQfK fKxEbD')
rest_name = listings.find('h4', class_='sc-1hp8d8a-0 sc-eTyWNx gKsZcT').text
##Output: AttributeError: 'NoneType' object has no attribute 'find'
print(listings)
##returns None
Here is the inspected tag of the website which i try to get the h4 class showing the restaurant's name:
CodePudding user response:
What happens?
Classes are generated dynamically and may differ from your inspections via developer tools - So you won't find what you are looking for.
How to fix?
It would be a better approach to select your targets via tag
or id
if available, cause these are more static than css classes
.
listings = soup.select('a:has(h4)')
Example
Iterating listings and scrape several infromation:
import requests
import bs4 as bs
url = 'https://www.zomato.com/jakarta/pondok-indah-restaurants'
req = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
html = req.text
soup = bs.BeautifulSoup(html, "html.parser")
data = []
for item in soup.select('a:has(h4)'):
data.append({
'title':item.h4.text,
'url':item['href'],
'etc':'...'
})
print(data)
Output
[{'title': 'Radio Dalam Diner', 'url': '/jakarta/radio-dalam-diner-pondok-indah/info', 'etc': '...'}, {'title': 'Aneka Bubur 786', 'url': '/jakarta/aneka-bubur-786-pondok-indah/info', 'etc': '...'}, {'title': "McDonald's", 'url': '/jakarta/mcdonalds-pondok-indah/info', 'etc': '...'}, {'title': 'KOPIKOBOY', 'url': '/jakarta/kopikoboy-pondok-indah/info', 'etc': '...'}, {'title': 'Kopitelu', 'url': '/jakarta/kopitelu-pondok-indah/info', 'etc': '...'}, {'title': 'KFC', 'url': '/jakarta/kfc-pondok-indah/info', 'etc': '...'}, {'title': 'HokBen Delivery', 'url': '/jakarta/hokben-delivery-pondok-indah/info', 'etc': '...'}, {'title': 'PHD', 'url': '/jakarta/phd-pondok-indah/info', 'etc': '...'}, {'title': 'Casa De Jose', 'url': '/jakarta/casa-de-jose-pondok-indah/info', 'etc': '...'}]