Home > OS >  Getting list of items in a div using beautiful soup
Getting list of items in a div using beautiful soup

Time:07-07

I am trying to use beautiful soup to pull a list of courses from a HTML Structure

CodePudding user response:

The webpage is dynamic and bs4 can't render JavaScript but can mimic using bs4 with selenium. I use CSS selectors to parse the html DOM elements.

Example:

import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))#,options=options
data=[]  
driver.get('https://www.udacity.com/courses/all?price=Free')
time.sleep(5)
driver.maximize_window()
time.sleep(3)
 
soup = BeautifulSoup(driver.page_source, 'lxml')
    
for course in soup.select('.catalog-v2_results__1FjDi > li'):
    title= course.select_one('.card_title__35G97').text
    data.append({
        'title':title
        })
df=pd.DataFrame(data)
print(df)

Output:

                        title
0                   Intro to Data Analysis
1                    SQL for Data Analysis
2       Database Systems Concepts & Design
3          Intro to Inferential Statistics
4                                    Spark
..                                     ...
186               Front-End Interview Prep
187              Full-Stack Interview Prep
188  Data Structures & Algorithms in Swift
189                     iOS Interview Prep
190                      VR Interview Prep

[191 rows x 1 columns]
  • Related