Home > OS >  Looping through HTML to collect data
Looping through HTML to collect data

Time:06-22

I am new to web scraping so looking to test with the NBA data on Basketball Reference. I am trying to collect the data for the standings for the league, conference and divisions. I then want to store them into a database.

so far i have the code below which gives me the team names of the Eastern Confrence.

I need to loop through the HTML and collect the data points, but unsure how to proceed.

import requests
from bs4 import BeautifulSoup

url = 'https://www.basketball-reference.com/leagues/NBA_2022_standings.html'

r = requests.get(url)

r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')

eastern_conf_table = soup.find('table' , id = 'confs_standings_E')


for team in eastern_conf_table.find_all('tbody'):
    rows = team.find_all("tr")

# loop over all rows, get all cells
for row in rows:
    try:
        teams = row.find_all('th')
        # print contents of the second cell in the row
        print(teams[0].a.text)

    except:
        pass

I will then need to collect the same data for the other conferences, divisions and leagues.

CodePudding user response:

The easiest way to do it, using Pandas.

import pandas as pd

df = pd.read_html('https://www.basketball-reference.com/leagues/NBA_2022_standings.html', match='Eastern Conference')
print(df[0])

OUTPUT:

         Eastern Conference   W   L   W/L%    GB   PS/G   PA/G   SRS
0           Miami Heat* (1)  53  29  0.646     —  110.0  105.6  4.23
1       Boston Celtics* (2)  51  31  0.622   2.0  111.8  104.5  7.02
2      Milwaukee Bucks* (3)  51  31  0.622   2.0  115.5  112.1  3.22
3   Philadelphia 76ers* (4)  51  31  0.622   2.0  109.9  107.3  2.57
4      Toronto Raptors* (5)  48  34  0.585   5.0  109.4  107.1  2.38
5        Chicago Bulls* (6)  46  36  0.561   7.0  111.6  112.0 -0.38
6        Brooklyn Nets* (7)  44  38  0.537   9.0  112.9  112.1  0.82
7   Cleveland Cavaliers (8)  44  38  0.537   9.0  107.8  105.7  2.04
8        Atlanta Hawks* (9)  43  39  0.524  10.0  113.9  112.4  1.55
9    Charlotte Hornets (10)  43  39  0.524  10.0  115.3  114.9  0.53
10     New York Knicks (11)  37  45  0.451  16.0  106.5  106.6 -0.01
11  Washington Wizards (12)  35  47  0.427  18.0  108.6  112.0 -3.23
12      Indiana Pacers (13)  25  57  0.305  28.0  111.5  114.9 -3.26
13     Detroit Pistons (14)  23  59  0.280  30.0  104.8  112.5 -7.36
14       Orlando Magic (15)  22  60  0.268  31.0  104.2  112.2 -7.67

But if you need BS an Request, example:

import requests
from bs4 import BeautifulSoup

url = 'https://www.basketball-reference.com/leagues/NBA_2022_standings.html'
soup = BeautifulSoup(requests.get(url).text, features='lxml')
confs_standings_E = soup.find('table', attrs={'id': 'confs_standings_E'})
for stats in confs_standings_E.find_all('tr', class_='full_table'):
    team_name = stats.find('th', attrs={'data-stat': 'team_name'}).getText().strip()
    wins = stats.find('td', attrs={'data-stat': 'wins'}).getText().strip()
    losses = stats.find('td', attrs={'data-stat': 'losses'}).getText().strip()
    win_loss_pct = stats.find('td', attrs={'data-stat': 'win_loss_pct'}).getText().strip()
    gb = stats.find('td', attrs={'data-stat': 'gb'}).getText().strip()
    pts_per_g = stats.find('td', attrs={'data-stat': 'pts_per_g'}).getText().strip()
    opp_pts_per_g = stats.find('td', attrs={'data-stat': 'opp_pts_per_g'}).getText().strip()
    srs = stats.find('td', attrs={'data-stat': 'srs'}).getText().strip()
    print(team_name, wins, losses, win_loss_pct, gb, pts_per_g, opp_pts_per_g, srs)

OUTPUT:

Miami Heat* (1) 53 29 .646 — 110.0 105.6 4.23
Boston Celtics* (2) 51 31 .622 2.0 111.8 104.5 7.02
Milwaukee Bucks* (3) 51 31 .622 2.0 115.5 112.1 3.22
Philadelphia 76ers* (4) 51 31 .622 2.0 109.9 107.3 2.57
Toronto Raptors* (5) 48 34 .585 5.0 109.4 107.1 2.38
Chicago Bulls* (6) 46 36 .561 7.0 111.6 112.0 -0.38
Brooklyn Nets* (7) 44 38 .537 9.0 112.9 112.1 0.82
Cleveland Cavaliers (8) 44 38 .537 9.0 107.8 105.7 2.04
Atlanta Hawks* (9) 43 39 .524 10.0 113.9 112.4 1.55
Charlotte Hornets (10) 43 39 .524 10.0 115.3 114.9 0.53
New York Knicks (11) 37 45 .451 16.0 106.5 106.6 -0.01
Washington Wizards (12) 35 47 .427 18.0 108.6 112.0 -3.23
Indiana Pacers (13) 25 57 .305 28.0 111.5 114.9 -3.26
Detroit Pistons (14) 23 59 .280 30.0 104.8 112.5 -7.36
Orlando Magic (15) 22 60 .268 31.0 104.2 112.2 -7.67
  • Related