I am trying to webscrape this website: https://datausa.io/profile/university/cuny-city-college/
My code only retrieves the first matching div class tag which is tuition but I only want to retrieve Room and Board cost. How do I parse a specific tag?
import requests
url = requests.get('https://datausa.io/profile/university/cuny-city-college/')
soup = BeautifulSoup(url.text, 'html.parser')
rb = soup.find('div',class_='stat-value')
print(rb.prettify)
CodePudding user response:
What you can do use find
method on state-titel
and add specific text in it so it will find that tag and we have to extract previous tag from it so use previous
method on it!
import requests
url = requests.get('https://datausa.io/profile/university/cuny-city-college/')
soup = BeautifulSoup(url.text, 'html.parser')
rb = soup.find('div',class_='stat-title',text="Room and Board").find_previous()
print(rb.get_text())
Output:
$15,406
CodePudding user response:
You can use :has
, :-soup-contains
, and an adjacent sibling combinator ( ), to specify stat-value
with immediately adjacent stat-title
containing text "Room and Board"
import requests
from bs4 import BeautifulSoup as bs
soup = bs(requests.get('https://datausa.io/profile/university/cuny-city-college/').text)
print(soup.select_one('.stat-value:has( .stat-title:-soup-contains("Room and Board"))').text)