I am trying to use beautiful soup to pull a list of courses from a
CodePudding user response:
The webpage is dynamic and bs4 can't render JavaScript but can mimic using bs4 with selenium. I use CSS selectors to parse the html DOM elements.
Example:
import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))#,options=options
data=[]
driver.get('https://www.udacity.com/courses/all?price=Free')
time.sleep(5)
driver.maximize_window()
time.sleep(3)
soup = BeautifulSoup(driver.page_source, 'lxml')
for course in soup.select('.catalog-v2_results__1FjDi > li'):
title= course.select_one('.card_title__35G97').text
data.append({
'title':title
})
df=pd.DataFrame(data)
print(df)
Output:
title
0 Intro to Data Analysis
1 SQL for Data Analysis
2 Database Systems Concepts & Design
3 Intro to Inferential Statistics
4 Spark
.. ...
186 Front-End Interview Prep
187 Full-Stack Interview Prep
188 Data Structures & Algorithms in Swift
189 iOS Interview Prep
190 VR Interview Prep
[191 rows x 1 columns]