#scraping ESPN
from bs4 import BeautifulSoup
import requests
html_text = requests.get('https://www.espn.com/womens-college-basketball/scoreboard/_/date/20221107').text
soup = BeautifulSoup(html_text, 'lxml')
game = soup.find('ul', class_= "ScoreCell__Competitors").text
[enter image description here][1]print(game)
#the text "Cleveland State" should be returned. I am a web scraping novice, any help is appreciated.
CodePudding user response:
Try using Selenium with chrome
Download Chrome and Chromedrive
Install selenium
pip install selenium
from selenium import webdriver
DRIVER_PATH = '/path/to/chromedriver'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get('https://google.com')
Get the element using your class name using the driver
h1 = driver.find_element(By.CLASS_NAME, 'ScoreCell__Competitors')
CodePudding user response:
Because element selection is a bit tricky
from bs4 import BeautifulSoup
import requests
html_text = requests.get('https://www.espn.com/womens-college-basketball/scoreboard/_/date/20221107').text
soup = BeautifulSoup(html_text, 'lxml')
games = soup.select('ul[] li:first-child')
lst = []
for game in games:
name = game.select_one('div[]').get_text(strip=True)
lst.append(name)
#print(name)
print(lst)
Output:
Cleveland State
Omaha
Cincinnati
Quinnipiac
Mount St. Mary's
Oral Roberts
Temple
Northwestern
Northern Illinois
Maryland
Bellarmine
Lamar
Creighton
East Tennessee State
Southern
San Diego State