Home > OS >  (Beginner) Python web scraping BeautifulSoup
(Beginner) Python web scraping BeautifulSoup

Time:04-04

how to scrape a text from an element with multiple attributes?

<h2  data-qa="heading">Popular Dishes</h2>

I used this

category = soup.find(name="h2", attrs={"class":"_63-j _1rimQ","data-qa":"heading"}).getText()

but it returns an error

AttributeError: 'NoneType' object has no attribute 'getText'

Same error is returned when using this

category = soup.find(name="h2",class_="_63-j _1rimQ")

CodePudding user response:

The content you wish to get from that page generate dynamically, so BeautifulSoup will not help you grab them. The requests is being issued to an endpoint. The following is how you can achieve using requests:

import requests

link = 'https://cw-api.takeaway.com/api/v29/restaurant'
params = {
    'slug': 'c-pizza-c-kebab'
}

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36',
    'x-requested-with': 'XMLHttpRequest',
    'x-country-code': 'fr',
}

with requests.Session() as s:
    s.headers.update(headers)
    res = s.get(link,params=params)
    container = res.json()['menu']['products']
    for key,val in container.items():
        print(val['name'])

Output (truncated):

Kebab veau
Pot de kebabs
Pot de frites
Margherita
Bambino
Reine
Sicilienne
Végétarienne
Calzone soufflée jambon
Calzone soufflée bœuf haché
Pêcheur

CodePudding user response:

from bs4 import BeautifulSoup as bs
html = """<h2  data-qa="heading">Popular Dishes</h2>"""
soup = bs(html, 'html.parser')
soup.find('h2', class_ = '_63-j _1rimQ').getText() # 'Popular Dishes'

Works very well here. Maybe the 'html.parser'?

BeautifulSoup 4.10.0, Python 3.10.2

  • Related