I'm trying to build a simple web scraper. I am trying to scrape a table, but I'm not sure why the output is: School, 20-5, 33.2
26 times over.
Here is my code:
from bs4 import BeautifulSoup
import requests
url = 'https://www.maxpreps.com/rankings/basketball/1/state/michigan.htm'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
teams = soup.find_all('tr')
for team in teams:
teamname = soup.find('th', class_ = "school").text
record = soup.find('td', class_= "overall dw").text
rating = soup.find('td', class_ = "rating sorted dw").text
print(teamname, record, rating)
CodePudding user response:
Notice that you're never using the Tag that team
refers to. Inside the for
loop, all of the calls to soup.find()
should be calls to team.find()
:
for team in teams[1:]:
teamname = team.find('th', class_ = "school").text
record = team.find('td', class_= "overall dw").text
rating = team.find('td', class_ = "rating sorted dw").text
print(teamname, record, rating)
This outputs:
St. Mary's Prep (Orchard Lake) 20-5 33.2
University of Detroit Jesuit (Detroit) 16-7 30.0
Williamston 25-0 29.3
Ferndale 21-3 28.9
Catholic Central (Grand Rapids) 25-1 28.4
King (Detroit) 18-3 27.4
De La Salle Collegiate (Warren) 18-7 27.2
Catholic Central (Novi) 16-9 26.6
Brother Rice (Bloomfield Hills) 15-7 26.5
Unity Christian (Hudsonville) 21-1 26.4
Hamtramck 21-4 26.3
Grand Blanc 20-5 25.9
East Lansing 18-5 25.0
Muskegon 20-3 24.8
Northview (Grand Rapids) 25-1 24.6
Cass Tech (Detroit) 21-4 24.3
North Farmington (Farmington Hills) 18-4 24.2
Beecher (Flint) 23-2 24.0
Okemos 19-5 23.9
Benton Harbor 23-3 23.2
Rockford 19-3 22.9
Grand Haven 17-4 21.9
Hartland 19-4 21.0
Marshall 20-3 21.0
Freeland 24-0 21.0
We use [1:]
to skip the table header, slicing off the first element in the teams
list.
CodePudding user response:
Let pandas
parse that table for you (it uses BeautifulSoup under the hoop).
import pandas as pd
url = 'https://www.maxpreps.com/rankings/basketball/1/state/michigan.htm'
df = pd.read_html(url)[0]
Output:
print(df)
# School Ovr. Rating Str. /-
0 1 St. Mary's Prep (Orchard Lake) 20-5 33.2 23.0 NaN
1 2 University of Detroit Jesuit (Detroit) 16-7 30.0 24.1 NaN
2 3 Williamston 25-0 29.3 10.9 NaN
3 4 Ferndale 21-3 28.9 16.5 NaN
4 5 Catholic Central (Grand Rapids) 25-1 28.4 11.4 NaN
5 6 King (Detroit) 18-3 27.4 15.2 NaN
6 7 De La Salle Collegiate (Warren) 18-7 27.2 19.6 2.0
7 8 Catholic Central (Novi) 16-9 26.6 22.6 -1.0
8 9 Brother Rice (Bloomfield Hills) 15-7 26.5 21.0 -1.0
9 10 Unity Christian (Hudsonville) 21-1 26.4 10.4 NaN
10 11 Hamtramck 21-4 26.3 14.5 2.0
11 12 Grand Blanc 20-5 25.9 15.3 -1.0
12 13 East Lansing 18-5 25.0 15.6 1.0
13 14 Muskegon 20-3 24.8 11.4 1.0
14 15 Northview (Grand Rapids) 25-1 24.6 8.2 1.0
15 16 Cass Tech (Detroit) 21-4 24.3 11.8 -4.0
16 17 North Farmington (Farmington Hills) 18-4 24.2 13.1 NaN
17 18 Beecher (Flint) 23-2 24.0 8.6 2.0
18 19 Okemos 19-5 23.9 13.7 -1.0
19 20 Benton Harbor 23-3 23.2 9.9 -1.0
20 21 Rockford 19-3 22.9 11.6 NaN
21 22 Grand Haven 17-4 21.9 11.3 NaN
22 23 Hartland 19-4 21.0 10.4 1.0
23 24 Marshall 20-3 21.0 8.6 -1.0
24 25 Freeland 24-0 21.0 2.7 4.0