I am trying to get the price and odometer reading for cars listed on a carsale site, in order to monitor when a specific model was listed and when it disappeared. A page may return 1 or many cars. I am new to both python and BeautifulSoup, and have most likely bitten off more than I can chew.
I managed to request the page, and find the div containers, each with details for one car.
I can iterate through the list of cars, but cannot address/extract subsequent tags for each car.
# import libraries
from bs4 import BeautifulSoup
import requests
# Request to website and download HTML contents
url = 'https://www.carsales.com.au/cars/2011/mercedes-benz/s-class/s350-badge/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}
response = requests.get(url, headers=headers)
response_code = response.status_code
if response_code != 200:
print(f"Error fetching page: {response_code}")
exit()
else:
content = response.content
soup = BeautifulSoup(content, 'html.parser')
# <div class="card-body">
SELECTOR_CAR = "card-body"
# <a class="js-encode-search" data-webm-clickvalue="sv-price" href="/cars/details/2011-mercedes-benz-s-class-s350-auto-my10/OAG-AD-19752647/?Cr=8">$40,990* <span class="currency"></span></a>
SELECTOR_PRICE = ""
# <ul class="key-details">
# <li class="key-details__value" data-type="Odometer">95,121 km</li>
SELECTOR_ODO = ""
# find all cars on page
# class is a python reserved work; use class_ instead
cars = soup.find_all(class_ = SELECTOR_CAR)
# ----- my original version
formatted_cars = [] # array for car details
for car in cars:
print("==========")
data = {
'title': car('js-encode-search'),
'price': car('key-details__value')
}
formatted_cars.append(data)
#car_soup = BeautifulSoup(car, 'html.parser')
#print(car_card.prettify)
#print(car_card)
print(formatted_cars)
# ----- end original
# ----- modified later
for car in cars:
print("==========")
for child in car.a.children:
print(child)
car_odo = car.li.contents
print(car_odo)
# ----- modified later end
Results [from the modified version of the 'for'] in:
python3 getCarsales_S350.py
9 Mercedes-Benz S-Class S350 cars for sale in Australia
9
==========
2009 Mercedes-Benz S-Class S350 Auto MY08
['181,150 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['291,153 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['192,851 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['78,606 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['38,806 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['172,012 km']
==========
2010 Mercedes-Benz S-Class S350 L Auto MY10
['77,800 km']
==========
2010 Mercedes-Benz S-Class S350 Auto MY10
['143,000 km']
==========
2011 Mercedes-Benz S-Class S350 Auto MY10
['95,121 km']
... which works by accident, rather than specifics, evidenced with being unable to get the price. Odo and title just happen to be the first elements.
Here a single car container:
<div class="card-body">
<div class="row">
<div class="col">
<h3>
<a class="js-encode-search" data-webm-clickvalue="sv-title"
href="/cars/details/2011-mercedes-benz-s-class-s350-auto-my10/OAG-AD-19752647/?Cr=8">2011
Mercedes-Benz S-Class S350 Auto MY10</a>
</h3>
</div>
<div class="col-12 col-xl-5 text-right">
<div class="item-price">
<div class="price">
<a class="js-encode-search" data-webm-clickvalue="sv-price"
href="/cars/details/2011-mercedes-benz-s-class-s350-auto-my10/OAG-AD-19752647/?Cr=8">$40,990*
<span class="currency"></span></a>
</div>
<div class="price-info-container">
<a class="price-info" data-target-url="/_details/api/v1/price-guide/carsales/OAG-AD-19752647"
data-toggle="lightbox" data-webm-clickvalue="sv-price-label">
Excl. Govt. Charges
</a>
<a class="additional-price-info"
data-target-url="/_details/api/v1/price-guide/carsales/OAG-AD-19752647"
data-toggle="lightbox"></a>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col">
<ul class="key-details">
<li class="key-details__value" data-type="Odometer">95,121 km</li>
<li class="key-details__value" data-type="Body Style">Sedan</li>
<li class="key-details__value" data-type="Transmission">Automatic</li>
<li class="key-details__value" data-type="Engine">6cyl 3.5L Petrol</li>
</ul>
<a class="xfacts-report" data-lightbox-height="650" data-lightbox-onclosed="onFactsPlusModalClosed"
data-lightbox-width="900" data-opm-event="click-facts-driver-listings"
data-opm-exp="facts-driver-listings" data-opm-trackon="click" data-seller-type="dealer"
data-smart-buyer-network-id="OAG-AD-19752647"
data-target-url="/smartbuyer/popup?networkId=OAG-AD-19752647&sourcesystem=desktop.carsales-dealer.listing-carfacts.buy.textlink&driver_crosssell=desktop.carsales-dealer.listing-carfacts.buy.textlink"
data-toggle="lightbox" data-webm-clickvalue="get-carfacts-report">
Pricing & history on this car - FACTS
</a>
</div>
<div class="col-12 col-xl-4 text-right d-flex align-items-start badge-csn">
</div>
</div>
</div>
CodePudding user response:
What happens
There are multiple tags containing class
js-encode-search
and you try to find_all()
of them.
How to fix
Make your selector more specific, cause the title is placed in <a>
of a parent <h3>
soup.select_one('h3 a')
Example
soup = BeautifulSoup(content, 'html.parser')
formatted_cars = [] # array for car details
for car in cars:
print("==========")
data = {
'title': ' '.join(soup.select_one('h3 a').get_text(strip=True).split()),
'price': soup.select_one('div.price a').get_text(strip=True)
}
formatted_cars.append(data)
print(formatted_cars)
Output
==========
[{'title': '2011 Mercedes-Benz S-Class S350 Auto MY10', 'price': '$40,990*'}]
CodePudding user response:
The selected answer is correct for one car.
To get the all cars the for
loop needs to look like this:
formatted_cars = [] # array for car details
for car in cars:
print("==========")
data = {
'title': ' '.join(car.select_one('h3 a').get_text(strip=True).split()),
'price': car.select_one('div.price a').get_text(strip=True),
'odo': car.select_one('ul.key-details li').get_text(strip=True)
}
#print(data)
formatted_cars.append(data)
print(formatted_cars)
The soup-reference is car of cars not the soup. (hope this makes sense)