Home > database >  Scrape dropdown options with BeautifulSoup
Scrape dropdown options with BeautifulSoup

Time:07-30

I want to webscrape an option list with BeautifulSoup to get the list of Automotive OEMs. As I just started coding I would higly appreciate your input! Thanks in advance!! :)

Desired output (remove "Make" as first entry, but list all other options):

Acura
Alfa Romeo
Aston Martin
Audi
Bentley
...

Output as of now:

Make

Code:

from bs4 import BeautifulSoup
import requests

#Inputs/URLs to scrape: 
URL = ('https://www.motortrend.com/cars/')
(response := requests.get(URL)).raise_for_status()
soup = BeautifulSoup(response.text, 'lxml')
overview = soup.find()

select_tag = soup.find('select')
options = select_tag.find_all("option")
for option in options:
    print(option.text)

CodePudding user response:

The data comes from a different page and is hydrated into the html. Either use a webdriver or use this instead:

import requests
response = requests.get('https://www.motortrend.com/api/v2/findyournextcarform/makes')
response.json()

output:

{'data': {'makes': [{'slug': 'acura', 'displayName': 'Acura'},
   {'slug': 'alfa-romeo', 'displayName': 'Alfa Romeo'},
   {'slug': 'aston-martin', 'displayName': 'Aston Martin'},
   {'slug': 'audi', 'displayName': 'Audi'},
   {'slug': 'bentley', 'displayName': 'Bentley'},
   {'slug': 'bmw', 'displayName': 'BMW'},
   {'slug': 'bollinger', 'displayName': 'Bollinger'},
   {'slug': 'bugatti', 'displayName': 'Bugatti'},
   {'slug': 'buick', 'displayName': 'Buick'},...]

CodePudding user response:

The website you are trying to scrape has dynamic content which can not be scraped using bs4 easily because bs4 doesn't render the JavaScript. As a result, some of your content doesn't show up in the HTML source code.

So to get what you want you need to use something like selenium which helps you render the page.

In your case, the thing you are trying to scrape is coming from another website via API which is fetched using JavaScript. But your code doesn't render JavaScript resulting in different output.

Here is a coding example using selenium to grab your desired output.

from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager


driver = webdriver.Chrome(ChromeDriverManager().install())

driver.get("https://www.motortrend.com/cars/")


try:
    elements = driver.find_elements(By.XPATH, '//select[@id="make-select"]//option')
except Exception as e:
    print(e)

makers = [i.text for i in elements]
makers.pop(0)
print(makers)

driver.quit()


I suggest you to use selenium get more out of this website.

  • Related