I need to scrape "2015" and "09/09/2015" from the below link:
lacentrale.fr/auto-occasion-annonce-87102353714.html
But since there are many li
and ul
, I cant scrape the exact text. I used the below code Your help is highly appreciated.
from bs4 import BeautifulSoup
soup = BeautifulSoup(HTML)
soup.find('span', {'class':'optionLabel'}).find_next('span').get_text()
CodePudding user response:
Try:
import requests
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0"
}
url = "https://www.lacentrale.fr/auto-occasion-annonce-87102353714.html"
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
v1 = soup.select_one('.optionLabel:-soup-contains("Année") span')
v2 = soup.select_one(
'.optionLabel:-soup-contains("Mise en circulation") span'
)
print(v1.text)
print(v2.text)
Prints:
2015
09/09/2015
CodePudding user response:
Fan of css selectors
and :-soup-contains()
as in @Andrejs answer mentioned. So just in case an alternative approach, if it comes to the point there are more options needed.
Generate a dict
with all options pick the relevant value, by option label as key:
data = dict((e.button.text,e.find_next('span').text) for e in soup.select('.optionLabel'))
data lokks like:
{'Année': '2015', 'Mise en circulation': '09/09/2015', 'Contrôle technique': 'requis', 'Kilométrage compteur': '68 736 Km', 'Énergie': 'Electrique', 'Rechargeable': 'oui', 'Autonomie batterie': '190 Km', 'Capacité batterie': '22 kWh', 'Boîte de vitesse': 'automatique', 'Couleur extérieure': 'gris foncé metal', 'Couleur intérieure': 'cuir noir', 'Nombre de portes': '5', 'Nombre de places': '4', 'Garantie': '6 mois', 'Première main (déclaratif)': 'non', 'Nombre de propriétaires': '2', 'Puissance fiscale': '3 CV', 'Puissance din': '102 ch', 'Puissance moteur': '125 kW', "Crit'Air": '0', 'Émissions de CO2': '0 g/kmA', 'Norme Euro': 'EURO6', 'Prime à la conversion': ''}
Example
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36'}
url = 'https://www.lacentrale.fr/auto-occasion-annonce-87102353714.html'
soup = BeautifulSoup(requests.get(url, headers=headers).text)
data = dict((e.button.text,e.find_next('span').text) for e in soup.select('.optionLabel'))
print(data['Année'], data['Mise en circulation'], sep='\n')
Output
2015
09/09/2015