I'm really new to web scraping and saw a few questions similar to mine but those solutions didn't work for me. So I'm trying to scrape this website: https://www.nba.com/schedule for the h4 tags, which hold the dates and times for upcoming basketball games. I'm trying to use beautiful soup to grab that tag but it always returns and empty list. Here's the code I'm using right now:
result = requests.get(url)
doc = BeautifulSoup(result.text, "html.parser")
schedule = doc.find_all('h4')
I saw something in another answer about the h4 tags being inside tags and I tried to use a json module but couldn't get that to work. Thanks for your help in advance!
CodePudding user response:
The data you see on the page is loaded from external URL, so BeautifulSoup doesn't see it. To load the data you can use following example:
import json
import requests
url = "https://cdn.nba.com/static/json/staticData/scheduleLeagueV2_1.json"
data = requests.get(url).json()
# uncomment to print all data:
# print(json.dumps(data, indent=4))
for g in data["leagueSchedule"]["gameDates"]:
print(g["gameDate"])
for game in g["games"]:
print(
game["homeTeam"]["teamCity"],
game["homeTeam"]["teamName"],
"-",
game["awayTeam"]["teamCity"],
game["awayTeam"]["teamName"],
)
print()
Prints:
10/3/2021 12:00:00 AM
Los Angeles Lakers - Brooklyn Nets
10/4/2021 12:00:00 AM
Toronto Raptors - Philadelphia 76ers
Boston Celtics - Orlando Magic
Miami Heat - Atlanta Hawks
Minnesota Timberwolves - New Orleans Pelicans
Oklahoma City Thunder - Charlotte Hornets
San Antonio Spurs - Utah Jazz
Portland Trail Blazers - Golden State Warriors
Sacramento Kings - Phoenix Suns
LA Clippers - Denver Nuggets
10/5/2021 12:00:00 AM
New York Knicks - Indiana Pacers
Chicago Bulls - Cleveland Cavaliers
Houston Rockets - Washington Wizards
Memphis Grizzlies - Milwaukee Bucks
...and so on.