Home > Back-end >  How to parse a specific HTML tag using BeautifulSoup?
How to parse a specific HTML tag using BeautifulSoup?

Time:12-17

I am trying to webscrape this website: https://datausa.io/profile/university/cuny-city-college/

My code only retrieves the first matching div class tag which is tuition but I only want to retrieve Room and Board cost. How do I parse a specific tag?

import requests

url = requests.get('https://datausa.io/profile/university/cuny-city-college/')
soup = BeautifulSoup(url.text, 'html.parser')

rb = soup.find('div',class_='stat-value')

print(rb.prettify)

CodePudding user response:

What you can do use find method on state-titel and add specific text in it so it will find that tag and we have to extract previous tag from it so use previous method on it!

import requests

url = requests.get('https://datausa.io/profile/university/cuny-city-college/')
soup = BeautifulSoup(url.text, 'html.parser')

rb = soup.find('div',class_='stat-title',text="Room and Board").find_previous()
print(rb.get_text())

Output:

$15,406

CodePudding user response:

You can use :has, :-soup-contains, and an adjacent sibling combinator ( ), to specify stat-value with immediately adjacent stat-title containing text "Room and Board"

import requests
from bs4 import BeautifulSoup as bs

soup = bs(requests.get('https://datausa.io/profile/university/cuny-city-college/').text)
print(soup.select_one('.stat-value:has(  .stat-title:-soup-contains("Room and Board"))').text)
  • Related