how to scrape a text from an element with multiple attributes?
<h2 data-qa="heading">Popular Dishes</h2>
I used this
category = soup.find(name="h2", attrs={"class":"_63-j _1rimQ","data-qa":"heading"}).getText()
but it returns an error
AttributeError: 'NoneType' object has no attribute 'getText'
Same error is returned when using this
category = soup.find(name="h2",class_="_63-j _1rimQ")
CodePudding user response:
The content you wish to get from that page generate dynamically, so BeautifulSoup will not help you grab them. The requests is being issued to an endpoint. The following is how you can achieve using requests:
import requests
link = 'https://cw-api.takeaway.com/api/v29/restaurant'
params = {
'slug': 'c-pizza-c-kebab'
}
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
'x-country-code': 'fr',
}
with requests.Session() as s:
s.headers.update(headers)
res = s.get(link,params=params)
container = res.json()['menu']['products']
for key,val in container.items():
print(val['name'])
Output (truncated):
Kebab veau
Pot de kebabs
Pot de frites
Margherita
Bambino
Reine
Sicilienne
Végétarienne
Calzone soufflée jambon
Calzone soufflée bœuf haché
Pêcheur
CodePudding user response:
from bs4 import BeautifulSoup as bs
html = """<h2 data-qa="heading">Popular Dishes</h2>"""
soup = bs(html, 'html.parser')
soup.find('h2', class_ = '_63-j _1rimQ').getText() # 'Popular Dishes'
Works very well here. Maybe the 'html.parser'
?
BeautifulSoup 4.10.0, Python 3.10.2