I am new to web scraping so looking to test with the NBA data on Basketball Reference. I am trying to collect the data for the standings for the league, conference and divisions. I then want to store them into a database.
so far i have the code below which gives me the team names of the Eastern Confrence.
I need to loop through the HTML and collect the data points, but unsure how to proceed.
import requests
from bs4 import BeautifulSoup
url = 'https://www.basketball-reference.com/leagues/NBA_2022_standings.html'
r = requests.get(url)
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
eastern_conf_table = soup.find('table' , id = 'confs_standings_E')
for team in eastern_conf_table.find_all('tbody'):
rows = team.find_all("tr")
# loop over all rows, get all cells
for row in rows:
try:
teams = row.find_all('th')
# print contents of the second cell in the row
print(teams[0].a.text)
except:
pass
I will then need to collect the same data for the other conferences, divisions and leagues.
CodePudding user response:
The easiest way to do it, using Pandas.
import pandas as pd
df = pd.read_html('https://www.basketball-reference.com/leagues/NBA_2022_standings.html', match='Eastern Conference')
print(df[0])
OUTPUT:
Eastern Conference W L W/L% GB PS/G PA/G SRS
0 Miami Heat* (1) 53 29 0.646 — 110.0 105.6 4.23
1 Boston Celtics* (2) 51 31 0.622 2.0 111.8 104.5 7.02
2 Milwaukee Bucks* (3) 51 31 0.622 2.0 115.5 112.1 3.22
3 Philadelphia 76ers* (4) 51 31 0.622 2.0 109.9 107.3 2.57
4 Toronto Raptors* (5) 48 34 0.585 5.0 109.4 107.1 2.38
5 Chicago Bulls* (6) 46 36 0.561 7.0 111.6 112.0 -0.38
6 Brooklyn Nets* (7) 44 38 0.537 9.0 112.9 112.1 0.82
7 Cleveland Cavaliers (8) 44 38 0.537 9.0 107.8 105.7 2.04
8 Atlanta Hawks* (9) 43 39 0.524 10.0 113.9 112.4 1.55
9 Charlotte Hornets (10) 43 39 0.524 10.0 115.3 114.9 0.53
10 New York Knicks (11) 37 45 0.451 16.0 106.5 106.6 -0.01
11 Washington Wizards (12) 35 47 0.427 18.0 108.6 112.0 -3.23
12 Indiana Pacers (13) 25 57 0.305 28.0 111.5 114.9 -3.26
13 Detroit Pistons (14) 23 59 0.280 30.0 104.8 112.5 -7.36
14 Orlando Magic (15) 22 60 0.268 31.0 104.2 112.2 -7.67
But if you need BS an Request, example:
import requests
from bs4 import BeautifulSoup
url = 'https://www.basketball-reference.com/leagues/NBA_2022_standings.html'
soup = BeautifulSoup(requests.get(url).text, features='lxml')
confs_standings_E = soup.find('table', attrs={'id': 'confs_standings_E'})
for stats in confs_standings_E.find_all('tr', class_='full_table'):
team_name = stats.find('th', attrs={'data-stat': 'team_name'}).getText().strip()
wins = stats.find('td', attrs={'data-stat': 'wins'}).getText().strip()
losses = stats.find('td', attrs={'data-stat': 'losses'}).getText().strip()
win_loss_pct = stats.find('td', attrs={'data-stat': 'win_loss_pct'}).getText().strip()
gb = stats.find('td', attrs={'data-stat': 'gb'}).getText().strip()
pts_per_g = stats.find('td', attrs={'data-stat': 'pts_per_g'}).getText().strip()
opp_pts_per_g = stats.find('td', attrs={'data-stat': 'opp_pts_per_g'}).getText().strip()
srs = stats.find('td', attrs={'data-stat': 'srs'}).getText().strip()
print(team_name, wins, losses, win_loss_pct, gb, pts_per_g, opp_pts_per_g, srs)
OUTPUT:
Miami Heat* (1) 53 29 .646 — 110.0 105.6 4.23
Boston Celtics* (2) 51 31 .622 2.0 111.8 104.5 7.02
Milwaukee Bucks* (3) 51 31 .622 2.0 115.5 112.1 3.22
Philadelphia 76ers* (4) 51 31 .622 2.0 109.9 107.3 2.57
Toronto Raptors* (5) 48 34 .585 5.0 109.4 107.1 2.38
Chicago Bulls* (6) 46 36 .561 7.0 111.6 112.0 -0.38
Brooklyn Nets* (7) 44 38 .537 9.0 112.9 112.1 0.82
Cleveland Cavaliers (8) 44 38 .537 9.0 107.8 105.7 2.04
Atlanta Hawks* (9) 43 39 .524 10.0 113.9 112.4 1.55
Charlotte Hornets (10) 43 39 .524 10.0 115.3 114.9 0.53
New York Knicks (11) 37 45 .451 16.0 106.5 106.6 -0.01
Washington Wizards (12) 35 47 .427 18.0 108.6 112.0 -3.23
Indiana Pacers (13) 25 57 .305 28.0 111.5 114.9 -3.26
Detroit Pistons (14) 23 59 .280 30.0 104.8 112.5 -7.36
Orlando Magic (15) 22 60 .268 31.0 104.2 112.2 -7.67