I am trying to scrape the list of courses available on Udacity Website
CodePudding user response:
The issue with just creating a soup using the initial html content is that that site reasonably doesn't load everything at once and places additional courses dynamically possibly to have a lower initial page load time. To solve this you can use something like Selenium for Python.
Then, we'll use CSS Selectors to select
h2
elements with a class attribute containing "card_title" (I viewed the source on that site and it looks like that's how courses are displayed).You'll need to download a driver for Selenium, I'm using Chrome on Windows here so I downloaded chromedriver.exe from the list of available drivers (ChromeDriver 104.0.5112.79) for the latest stable release.
Example code:
from bs4 import BeautifulSoup from selenium import webdriver options = webdriver.ChromeOptions() options.add_argument('--headless') # I'm using Chrome in this example, you can search online for more on # how Selenium works. This executable path points to where I downloaded it browser = webdriver.Chrome(options=options, executable_path=r'C:\Users\User\Downloads\chromedriver_win32\chromedriver.exe') browser.get("https://www.udacity.com/courses/all") html = browser.page_source soup = BeautifulSoup(html, 'html.parser') # match h2 elements with a class containing "card_title" for course in soup.select('h2[class*="card_title"]'): course_name = course.get_text() # do something with course_name, e.g add it to a list print(course_name) browser.quit()
Output:
Data Engineer Business Analytics Product Manager Programming for Data Science with Python Introduction to Programming Data Scientist Data Analyst C React Blockchain Developer Self-Driving Car Engineer Machine Learning DevOps Engineer Deep Learning SQL Front End Web Developer Full Stack Web Developer Java Programming Digital Marketing Artificial Intelligence for Trading Data Structures and Algorithms UX Designer Java Developer AWS Machine Learning Engineer Intermediate Python AI Programming with Python Growth Product Manager Intro to Self-Driving Cars Cloud DevOps Engineer Robotics Software Engineer Deep Reinforcement Learning Data Architect Android Kotlin Developer Computer Vision Data Analysis and Visualization with Microsoft Power BI Natural Language Processing Cloud Developer Zero Trust Security Data Streaming AI Product Manager Introduction to Cybersecurity iOS Developer Data Engineering with Microsoft Azure Intro to Machine Learning with TensorFlow AWS Cloud Architect Full Stack JavaScript Developer Digital Project Management Cloud Native Application Architecture Intro to Machine Learning with PyTorch Data Product Manager Flying Car and Autonomous Flight Engineer Sensor Fusion Engineer Ethical Hacker Predictive Analytics For Business Intermediate JavaScript Android Basics Artificial Intelligence Agile Software Development Marketing Analytics Data Visualization Cloud DevOps using Microsoft Azure Digital Freelancer AI for Healthcare Hybrid Cloud Engineer Data Science for Business Leaders AI for Business Leaders Privacy Engineer Site Reliability Engineer Security Engineer Cloud Developer using Microsoft Azure Cloud Architect using Microsoft Azure Machine Learning Engineer for Microsoft Azure Security Architect AI Engineer using Microsoft Azure Data Privacy Security Analyst Enterprise Security Intel® Edge AI for IoT Developers Cloud Computing for Business Leaders Programming for Data Science with R RPA Developer with UiPath Cybersecurity for Business Leaders Intro to Information Security Cyber-Physical Systems Security Network Security Getting Started with Google Workspace Rapid Prototyping Creating an Analytical Dataset Problem Solving with Advanced Analytics Classification Models Product Design Segmentation and Clustering Time Series Forecasting App Marketing App Monetization A/B Testing for Business Analysts How to Build a Startup Get Your Startup Started Managing Remote Teams with Upwork Google Cloud Digital Leader Training Cloud Native Fundamentals Hybrid Cloud Fundamentals Intro to Data Analysis SQL for Data Analysis Database Systems Concepts & Design Intro to Inferential Statistics Spark Data Analysis and Visualization Cyber-Physical Systems Design & Analysis Differential Equations in Action Self-Driving Fundamentals: Featuring Apollo AWS Machine Learning Foundations Course Introduction to Machine Learning using Microsoft Azure AI Fundamentals Linear Algebra Refresher Course Machine Learning: Unsupervised Learning Big Data Analytics in Healthcare Intel® Edge AI Fundamentals with OpenVINO™ Artificial Intelligence Secure and Private AI Model Building and Validation Data Visualization and D3.js Machine Learning for Trading Machine Learning Intro to Hadoop and MapReduce Real-Time Analytics with Apache Storm A/B Testing Data Analysis with R Knowledge-Based AI: Cognitive Systems Introduction to TensorFlow Lite Introduction to Computer Vision Intro to TensorFlow for Deep Learning Eigenvectors and Eigenvalues Intro to Artificial Intelligence Artificial Intelligence for Robotics Intro to Deep Learning with PyTorch AWS DeepRacer Reinforcement Learning Introduction to Machine Learning Course Product Manager Interview Preparation Microsoft Power Platform Web Tooling & Automation Front End Frameworks Responsive Web Design Fundamentals How to Install Android Studio Android Basics: Multiscreen Apps Website Performance Optimization iOS Networking with Swift JavaScript Design Patterns Android Basics: User Input Android Performance Responsive Images Xcode Debugging Gradle for Android and Java Build Native Mobile Apps with Flutter JavaScript Promises UIKit Fundamentals Android Basics: User Interface Client-Server Communication What is Programming? Building High Conversion Web Forms Advanced Android App Development Software Architecture & Design Authentication & Authorization: OAuth Intro to iOS App Development with Swift Introduction to Operating Systems Android Basics: Networking Web Accessibility Android Basics: Data Storage Scalable Microservices with Kubernetes Developing Android Apps with Kotlin Browser Rendering Optimization Learn Swift Programming Syntax Offline Web Applications Kotlin for Android Developers UX Design for Mobile Developers Software Development Process Data Visualization in Tableau Intro to Progressive Web Apps Writing READMEs Software Analysis & Testing iOS Persistence and Core Data Computer Networking Firebase Analytics: iOS Human-Computer Interaction 2D Game Development with libGDX Intro to jQuery How to create <anything> in Android Introduction to Graduate Algorithms Dynamic Web Applications with Sinatra How to Make a Platformer Using libGDX JavaScript Testing Object-Oriented JavaScript Localization Essentials Compilers: Theory and Practice HTML5 Canvas Object Oriented Programming in Java Designing RESTful APIs GT - Refresher - Advanced OS Intro to JavaScript Grand Central Dispatch (GCD) Continuous Integration and Deployment Swift for Beginners Intro to Statistics Intro to HTML and CSS Developing Android Apps Introduction to Python Programming Introduction to Virtual Reality Objective-C for Swift Developers Interactive 3D Graphics Full Stack Foundations High Performance Computer Architecture AutoLayout Kotlin Bootcamp for Programmers Shell Workshop Core ML: Machine Learning for iOS Statistics Intro to Theoretical Computer Science Design of Computer Programs Data Wrangling with MongoDB Swift for Developers Firebase in a Weekend: Android Software Debugging Deploying a Hadoop Cluster Server-Side Swift Networking for Web Developers Intro to Physics Intro to Relational Databases ES6 - JavaScript Improved Mobile Design and Usability for iOS Intro to AJAX Intro to Algorithms The MVC Pattern in Ruby WeChat Mini Program Development Asynchronous JavaScript Requests Embedded Systems High Performance Computing HTTP & Web Servers Advanced Android with Kotlin Computability, Complexity & Algorithms Advanced Operating Systems Passwordless Login Solutions for iOS Version Control with Git Firebase in a Weekend: iOS Intro to Point & Click App Development Deploying Applications with Heroku Applied Cryptography Java Programming Basics C For Programmers Intro to Backend JavaScript and the DOM Firebase Analytics: Android Configuring Linux Web Servers How to Make an iOS App Intro to DevOps Google Maps APIs Passwordless Login Solutions for Android Mobile Design and Usability for Android iOS Design Patterns Intro to Psychology Engagement & Monetization | Mobile Games Material Design for Android Developers Craft Your Cover Letter Refresh Your Resume Strengthen Your LinkedIn Network & Brand Data Science Interview Prep Android Interview Prep Machine Learning Interview Preparation Front-End Interview Prep Full-Stack Interview Prep Data Structures & Algorithms in Swift iOS Interview Prep VR Interview Prep
CodePudding user response:
The webpage is loaded runtime JavaScript.Bs4 can't render/parse such dynamic content.So you can mimic all data using selenim with bs4 as follows:
Example:
from bs4 import BeautifulSoup from selenium import webdriver from selenium.webdriver.chrome.service import Service import time webdriver_service = Service("./chromedriver") #Your chromedriver path driver = webdriver.Chrome(service=webdriver_service) url = 'https://www.udacity.com/courses/all' driver.get(url) driver.maximize_window() time.sleep(3) soup=BeautifulSoup(driver.page_source, 'lxml') lst =[] for card in soup.find_all("a", class_= "card_container__25DrK"): title = card.select_one('h2.card_title__35G97').text lst.append(title) print(lst)
Output:
['Data Engineer', 'Business Analytics', 'Product Manager', 'Programming for Data Science with Python', 'Introduction to Programming', 'Data Scientist', 'Data Analyst', 'C ', 'React', 'Blockchain Developer', 'Self-Driving Car Engineer', 'Machine Learning DevOps Engineer', 'Deep Learning', 'SQL', 'Front End Web Developer', 'Full Stack Web Developer', 'Java Programming', 'Digital Marketing', 'Artificial Intelligence for Trading', 'Data Structures and Algorithms', 'UX Designer', 'Java Developer', 'AWS Machine Learning Engineer', 'Intermediate Python', 'AI Programming with Python', 'Growth Product Manager', 'Intro to Self-Driving Cars', 'Cloud DevOps Engineer', 'Robotics Software Engineer', 'Deep Reinforcement Learning', 'Data Architect', 'Android Kotlin Developer', 'Computer Vision', 'Data Analysis and Visualization with Microsoft Power BI', 'Natural Language Processing', 'Cloud Developer', 'Zero Trust Security', 'Data Streaming', 'AI Product Manager', 'Introduction to Cybersecurity', 'iOS Developer', 'Data Engineering with Microsoft Azure', 'Intro to Machine Learning with TensorFlow', 'AWS Cloud Architect', 'Full Stack JavaScript Developer', 'Digital Project Management', 'Cloud Native Application Architecture', 'Intro to Machine Learning with PyTorch', 'Data Product Manager', 'Flying Car and Autonomous Flight Engineer', 'Sensor Fusion Engineer', 'Ethical Hacker', 'Predictive Analytics For Business', 'Intermediate JavaScript', 'Android Basics', 'Artificial Intelligence', 'Agile Software Development', 'Marketing Analytics', 'Data Visualization', 'Cloud DevOps using Microsoft Azure', 'Digital Freelancer', 'AI for Healthcare', 'Hybrid Cloud Engineer', 'Data Science for Business Leaders', 'AI for Business Leaders', 'Privacy Engineer', 'Site Reliability Engineer', 'Security Engineer', 'Cloud Developer using Microsoft Azure', 'Cloud Architect using Microsoft Azure', 'Machine Learning Engineer for Microsoft Azure', 'Security Architect', 'AI Engineer using Microsoft Azure', 'Data Privacy', 'Security Analyst', 'Enterprise Security', 'Intel® Edge AI for IoT Developers', 'Cloud Computing for Business Leaders', 'Programming for Data Science with R', 'RPA Developer with UiPath', 'Cybersecurity for Business Leaders', 'Intro to Information Security', 'Cyber-Physical Systems Security', 'Network Security', 'Getting Started with Google Workspace', 'Rapid Prototyping', 'Creating an Analytical Dataset', 'Problem Solving with Advanced Analytics', 'Classification Models', 'Product Design', 'Segmentation and Clustering', 'Time Series Forecasting', 'App Marketing', 'App Monetization', 'A/B Testing for Business Analysts', 'How to Build a Startup', 'Get Your Startup Started', 'Managing Remote Teams with Upwork', 'Google Cloud Digital Leader Training', 'Cloud Native Fundamentals', 'Hybrid Cloud Fundamentals', 'Intro to Data Analysis', 'SQL for Data Analysis', 'Database Systems Concepts & Design', 'Intro to Inferential Statistics', 'Spark', 'Data Analysis and Visualization', 'Cyber-Physical Systems Design & Analysis', 'Differential Equations in Action', 'Self-Driving Fundamentals: Featuring Apollo ', 'AWS Machine Learning Foundations Course', 'Introduction to Machine Learning using Microsoft Azure', 'AI Fundamentals', 'Linear Algebra Refresher Course', 'Machine Learning: Unsupervised Learning', 'Big Data Analytics in Healthcare', 'Intel® Edge AI Fundamentals with OpenVINO™', 'Artificial Intelligence', 'Secure and Private AI', 'Model Building and Validation', 'Data Visualization and D3.js', 'Machine Learning for Trading', 'Machine Learning', 'Intro to Hadoop and MapReduce', 'Real-Time Analytics with Apache Storm', 'A/B Testing', 'Data Analysis with R', 'Knowledge-Based AI: Cognitive Systems', 'Introduction to TensorFlow Lite', 'Introduction to Computer Vision', 'Intro to TensorFlow for Deep Learning', 'Eigenvectors and Eigenvalues', 'Intro to Artificial Intelligence', 'Artificial Intelligence for Robotics', 'Intro to Deep Learning with PyTorch', 'AWS DeepRacer', 'Reinforcement Learning', 'Introduction to Machine Learning Course', 'Product Manager Interview Preparation', 'Microsoft Power Platform', 'Web Tooling & Automation', 'Front End Frameworks', 'Responsive Web Design Fundamentals', 'How to Install Android Studio', 'Android Basics: Multiscreen Apps', 'Website Performance Optimization', 'iOS Networking with Swift', 'JavaScript Design Patterns', 'Android Basics: User Input', 'Android Performance', 'Responsive Images', 'Xcode Debugging', 'Gradle for Android and Java', 'Build Native Mobile Apps with Flutter', 'JavaScript Promises', 'UIKit Fundamentals', 'Android Basics: User Interface', 'Client-Server Communication', 'What is Programming?', 'Building High Conversion Web Forms', 'Advanced Android App Development', 'Software Architecture & Design', 'Authentication & Authorization: OAuth', 'Intro to iOS App Development with Swift', 'Introduction to Operating Systems', 'Android Basics: Networking', 'Web Accessibility', 'Android Basics: Data Storage', 'Scalable Microservices with Kubernetes', 'Developing Android Apps with Kotlin', 'Browser Rendering Optimization', 'Learn Swift Programming Syntax', 'Offline Web Applications', 'Kotlin for Android Developers', 'UX Design for Mobile Developers', 'Software Development Process', 'Data Visualization in Tableau', 'Intro to Progressive Web Apps', 'Writing READMEs', 'Software Analysis & Testing', 'iOS Persistence and Core Data', 'Computer Networking', 'Firebase Analytics: iOS', 'Human-Computer Interaction', '2D Game Development with libGDX', 'Intro to jQuery', 'How to create <anything> in Android', 'Introduction to Graduate Algorithms', 'Dynamic Web Applications with Sinatra', 'How to Make a Platformer Using libGDX', 'JavaScript Testing', 'Object-Oriented JavaScript', 'Localization Essentials', 'Compilers: Theory and Practice', 'HTML5 Canvas', 'Object Oriented Programming in Java', 'Designing RESTful APIs', 'GT - Refresher - Advanced OS', 'Intro to JavaScript', 'Grand Central Dispatch (GCD)', 'Continuous Integration and Deployment', 'Swift for Beginners', 'Intro to Statistics', 'Intro to HTML and CSS', 'Developing Android Apps', 'Introduction to Python Programming', 'Introduction to Virtual Reality', 'Objective-C for Swift Developers', 'Interactive 3D Graphics', 'Full Stack Foundations', 'High Performance Computer Architecture', 'AutoLayout', 'Kotlin Bootcamp for Programmers', 'Shell Workshop', 'Core ML: Machine Learning for iOS', 'Statistics', 'Intro to Theoretical Computer Science', 'Design of Computer Programs', 'Data Wrangling with MongoDB', 'Swift for Developers', 'Firebase in a Weekend: Android', 'Software Debugging', 'Deploying a Hadoop Cluster', 'Server-Side Swift', 'Networking for Web Developers', 'Intro to Physics', 'Intro to Relational Databases', 'ES6 - JavaScript Improved', 'Mobile Design and Usability for iOS', 'Intro to AJAX', 'Intro to Algorithms', 'The MVC Pattern in Ruby', 'WeChat Mini Program Development', 'Asynchronous JavaScript Requests', 'Embedded Systems', 'High Performance Computing', 'HTTP & Web Servers', 'Advanced Android with Kotlin', 'Computability, Complexity & Algorithms', 'Advanced Operating Systems', 'Passwordless Login Solutions for iOS', 'Version Control with Git', 'Firebase in a Weekend: iOS', 'Intro to Point & Click App Development', 'Deploying Applications with Heroku', 'Applied Cryptography', 'Java Programming Basics', 'C For Programmers', 'Intro to Backend', 'JavaScript and the DOM', 'Firebase Analytics: Android', 'Configuring Linux Web Servers', 'How to Make an iOS App', 'Intro to DevOps', 'Google Maps APIs', 'Passwordless Login Solutions for Android', 'Mobile Design and Usability for Android', 'iOS Design Patterns', 'Intro to Psychology', 'Engagement & Monetization | Mobile Games', 'Material Design for Android Developers', 'Craft Your Cover Letter', 'Refresh Your Resume', 'Strengthen Your LinkedIn Network & Brand', 'Data Science Interview Prep', 'Android Interview Prep', 'Machine Learning Interview Preparation', 'Front-End Interview Prep', 'Full-Stack Interview Prep', 'Data Structures & Algorithms in Swift', 'iOS Interview Prep', 'VR Interview Prep']
CodePudding user response:
The main issue here is that BeautifulSoup by itself only performs static scraping i.e. gets just the static HTML. You will need to use something like
Selenium
with BeautifulSoup to scrape dynamically generated HTML.You may find the following tutorial useful: WebScraping with BeautifulSoup and Selenium
Additionally, you should also ensure the correct tag is being targeted. For example, in your screen-shot, the target is an anchor tag so your
find_all
should be as follows:name = soup.find_all('a', class_='card_container__25DrK')
However, do check the HTML retrieved by your program to make sure you are targeting the correct tag and specifying the correct attribute value.