I am trying to scrape a website with multiple brackets. My plan is to have 3 varbiales (oem, model, leadtime) to generate the desired output. However, I cannot figure out how to scrape this webpage in 3 variables. Given I am new to python and beautfulsoup, I highly appreciate your feedback. Thanks in advance!
Desired output with 3 varibles and the command: print(oem, model, leadtime)
Audi, A1 Sportback, 27 weeks
Audi, A3 Sportback, 27 weeks
...
Volvo, XC90, 27 weeks
Error of code as of now:
AttributeError: 'NavigableString' object has no attribute 'select'
Code as of now:
from bs4 import BeautifulSoup
import requests
response = requests.get("https://www.carwow.co.uk/new-car-delivery-times#gref").text
soup = BeautifulSoup(response, 'html.parser')
for tbody in soup.select('tbody'):
for tr in tbody:
oem = tr.select('td > a')[0].get('href').split('/')[3].capitalize()
model = tr.select('td > a')[0].get('href').split('/')[4].capitalize()
lead_time = tr.select('td')[1].getText(strip=True)
print(oem, model, lead_time)
CodePudding user response:
Try:
import requests
from bs4 import BeautifulSoup
response = requests.get(
"https://www.carwow.co.uk/new-car-delivery-times#gref"
).text
soup = BeautifulSoup(response, "html.parser")
for tbody in soup.select("tbody"): # for each table
for tr in tbody.select("tr")[1:]: # skip header
brand, leadtime = [
td.get_text(strip=True, separator=" ") for td in tr.select("td")
][:2]
oem, model = brand.split(maxsplit=1)
print("{:<20} {:<20} {}".format(oem, model, leadtime))
Prints:
...
Toyota RAV4 45
Toyota Yaris 18
Volkswagen Golf 41
Volkswagen Golf GTI 45
Volkswagen Golf R 45
Volkswagen Polo 32
Volkswagen T-Cross 23
Volkswagen Tiguan 52
Volkswagen Touareg 52
Volkswagen ID3 52
Volkswagen ID4 52
Volvo V60 27
Volvo V90 27
Volvo XC40 18
Volvo XC60 27
Volvo XC90 27