Scraping table returning repeated values-CodePudding

I'm trying to build a simple web scraper. I am trying to scrape a table, but I'm not sure why the output is: School, 20-5, 33.2 26 times over.

Here is my code:

from bs4 import BeautifulSoup
import requests

url = 'https://www.maxpreps.com/rankings/basketball/1/state/michigan.htm'
page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')
teams = soup.find_all('tr')
for team in teams:
    teamname = soup.find('th', class_ = "school").text
    record = soup.find('td', class_= "overall dw").text
    rating = soup.find('td', class_ = "rating sorted dw").text

    print(teamname, record, rating)

CodePudding user response：

Notice that you're never using the Tag that team refers to. Inside the for loop, all of the calls to soup.find() should be calls to team.find():

for team in teams[1:]:
    teamname = team.find('th', class_ = "school").text
    record = team.find('td', class_= "overall dw").text
    rating = team.find('td', class_ = "rating sorted dw").text
    print(teamname, record, rating)

This outputs:

St. Mary's Prep (Orchard Lake) 20-5 33.2
University of Detroit Jesuit (Detroit) 16-7 30.0
Williamston 25-0 29.3
Ferndale 21-3 28.9
Catholic Central (Grand Rapids) 25-1 28.4
King (Detroit) 18-3 27.4
De La Salle Collegiate (Warren) 18-7 27.2
Catholic Central (Novi) 16-9 26.6
Brother Rice (Bloomfield Hills) 15-7 26.5
Unity Christian (Hudsonville) 21-1 26.4
Hamtramck 21-4 26.3
Grand Blanc 20-5 25.9
East Lansing 18-5 25.0
Muskegon 20-3 24.8
Northview (Grand Rapids) 25-1 24.6
Cass Tech (Detroit) 21-4 24.3
North Farmington (Farmington Hills) 18-4 24.2
Beecher (Flint) 23-2 24.0
Okemos 19-5 23.9
Benton Harbor 23-3 23.2
Rockford 19-3 22.9
Grand Haven 17-4 21.9
Hartland 19-4 21.0
Marshall 20-3 21.0
Freeland 24-0 21.0

We use [1:] to skip the table header, slicing off the first element in the teams list.

CodePudding user response：

Let pandas parse that table for you (it uses BeautifulSoup under the hoop).

import pandas as pd

url = 'https://www.maxpreps.com/rankings/basketball/1/state/michigan.htm'
df = pd.read_html(url)[0]

Output:

print(df)
     #                                  School  Ovr.  Rating  Str.   /-
0    1          St. Mary's Prep (Orchard Lake)  20-5    33.2  23.0  NaN
1    2  University of Detroit Jesuit (Detroit)  16-7    30.0  24.1  NaN
2    3                             Williamston  25-0    29.3  10.9  NaN
3    4                                Ferndale  21-3    28.9  16.5  NaN
4    5         Catholic Central (Grand Rapids)  25-1    28.4  11.4  NaN
5    6                          King (Detroit)  18-3    27.4  15.2  NaN
6    7         De La Salle Collegiate (Warren)  18-7    27.2  19.6  2.0
7    8                 Catholic Central (Novi)  16-9    26.6  22.6 -1.0
8    9         Brother Rice (Bloomfield Hills)  15-7    26.5  21.0 -1.0
9   10           Unity Christian (Hudsonville)  21-1    26.4  10.4  NaN
10  11                               Hamtramck  21-4    26.3  14.5  2.0
11  12                             Grand Blanc  20-5    25.9  15.3 -1.0
12  13                            East Lansing  18-5    25.0  15.6  1.0
13  14                                Muskegon  20-3    24.8  11.4  1.0
14  15                Northview (Grand Rapids)  25-1    24.6   8.2  1.0
15  16                     Cass Tech (Detroit)  21-4    24.3  11.8 -4.0
16  17     North Farmington (Farmington Hills)  18-4    24.2  13.1  NaN
17  18                         Beecher (Flint)  23-2    24.0   8.6  2.0
18  19                                  Okemos  19-5    23.9  13.7 -1.0
19  20                           Benton Harbor  23-3    23.2   9.9 -1.0
20  21                                Rockford  19-3    22.9  11.6  NaN
21  22                             Grand Haven  17-4    21.9  11.3  NaN
22  23                                Hartland  19-4    21.0  10.4  1.0
23  24                                Marshall  20-3    21.0   8.6 -1.0
24  25                                Freeland  24-0    21.0   2.7  4.0