I am new to BS4 and trying to web scrape data for a specific HMTL class. A snippet of my HTML data looks like the following
<td >31</td == $0
<td >
<a href="/boxscores/20220908ram.htm">
"F"
<span class =no_mobile">inal</span>
</a>
</td>
The problem I am having is that when I try to FindAll() for the class "right", I am also seeing the contents of the class "right gamelink". Is there a way to specify that the returned text should only come from the "right" class instead of the "right gamelink" class?
Code:
from bs4 import BeautifulSoup
import requests
weekNumber = 1
url = "https://www.pro-football-reference.com/years/2022/week_" str(weekNumber) ".htm"
print(url)
req = requests.get(url)
webpage = BeautifulSoup(req.text, 'html.parser')
scores = webpage.findAll("td", attrs={'class': 'right'})
for score in scores:
current_score = score.text.strip()
print(current_score)
Output:
31
Final
CodePudding user response:
Use css selectors instead - with the format below. So change
scores = webpage.findAll("td", attrs={'class': 'right'})
to
scores = webpage.select('td[]')
and see if it works.
CodePudding user response:
maybe it will be useful for you
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.pro-football-reference.com/years/2022/week_1.htm'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
results = []
for game in soup.find_all('div', class_='game_summary expanded nohover'):
teams = []
for x in game.find('table', class_='teams').find_all('tr'):
teams.append(list(filter(None, [a.strip() for a in x.get_text().split('\n')])))
results.append({
'Date': teams[0][0],
'Home': {
'Name': teams[1][0],
'Score': teams[1][1]
},
'Guest': {
'Name': teams[2][0],
'Score': teams[2][1]
},
'Result': (lambda r: teams[1][-1] if len(teams[2]) < 3 else f'{teams[1][-1]} {teams[2][-1]}')(teams)
})
df = pd.DataFrame(results)
print(df.to_string(index=False))
OUTPUT:
Date Home Guest Result
Sep 8, 2022 {'Name': 'Buffalo Bills', 'Score': '31'} {'Name': 'Los Angeles Rams', 'Score': '10'} Final
Sep 11, 2022 {'Name': 'New Orleans Saints', 'Score': '27'} {'Name': 'Atlanta Falcons', 'Score': '26'} Final
Sep 11, 2022 {'Name': 'Cleveland Browns', 'Score': '26'} {'Name': 'Carolina Panthers', 'Score': '24'} Final
Sep 11, 2022 {'Name': 'San Francisco 49ers', 'Score': '10'} {'Name': 'Chicago Bears', 'Score': '19'} Final
Sep 11, 2022 {'Name': 'Pittsburgh Steelers', 'Score': '23'} {'Name': 'Cincinnati Bengals', 'Score': '20'} Final OT
Sep 11, 2022 {'Name': 'Philadelphia Eagles', 'Score': '38'} {'Name': 'Detroit Lions', 'Score': '35'} Final
Sep 11, 2022 {'Name': 'Indianapolis Colts', 'Score': '20'} {'Name': 'Houston Texans', 'Score': '20'} Final OT
Sep 11, 2022 {'Name': 'New England Patriots', 'Score': '7'} {'Name': 'Miami Dolphins', 'Score': '20'} Final
Sep 11, 2022 {'Name': 'Baltimore Ravens', 'Score': '24'} {'Name': 'New York Jets', 'Score': '9'} Final
Sep 11, 2022 {'Name': 'Jacksonville Jaguars', 'Score': '22'} {'Name': 'Washington Commanders', 'Score': '28'} Final
Sep 11, 2022 {'Name': 'Kansas City Chiefs', 'Score': '44'} {'Name': 'Arizona Cardinals', 'Score': '21'} Final
Sep 11, 2022 {'Name': 'Green Bay Packers', 'Score': '7'} {'Name': 'Minnesota Vikings', 'Score': '23'} Final
Sep 11, 2022 {'Name': 'New York Giants', 'Score': '21'} {'Name': 'Tennessee Titans', 'Score': '20'} Final
Sep 11, 2022 {'Name': 'Las Vegas Raiders', 'Score': '19'} {'Name': 'Los Angeles Chargers', 'Score': '24'} Final
Sep 11, 2022 {'Name': 'Tampa Bay Buccaneers', 'Score': '19'} {'Name': 'Dallas Cowboys', 'Score': '3'} Final
Sep 12, 2022 {'Name': 'Denver Broncos', 'Score': '16'} {'Name': 'Seattle Seahawks', 'Score': '17'} Final
Or u can change dict, to
results.append({
'Date': teams[0][0],
'Home Team': teams[1][0],
'Guest Team': teams[2][0],
'Score': f'{teams[1][1]}-{teams[2][1]}',
'Result': (lambda r: teams[1][-1] if len(teams[2]) < 3 else f'{teams[1][-1]} {teams[2][-1]}')(teams)
})
And ur table now looks like:
Date Home Team Guest Team Score Result
Sep 8, 2022 Buffalo Bills Los Angeles Rams 31-10 Final
Sep 11, 2022 New Orleans Saints Atlanta Falcons 27-26 Final
Sep 11, 2022 Cleveland Browns Carolina Panthers 26-24 Final
Sep 11, 2022 San Francisco 49ers Chicago Bears 10-19 Final
Sep 11, 2022 Pittsburgh Steelers Cincinnati Bengals 23-20 Final OT
Sep 11, 2022 Philadelphia Eagles Detroit Lions 38-35 Final
Sep 11, 2022 Indianapolis Colts Houston Texans 20-20 Final OT
Sep 11, 2022 New England Patriots Miami Dolphins 7-20 Final
Sep 11, 2022 Baltimore Ravens New York Jets 24-9 Final
Sep 11, 2022 Jacksonville Jaguars Washington Commanders 22-28 Final
Sep 11, 2022 Kansas City Chiefs Arizona Cardinals 44-21 Final
Sep 11, 2022 Green Bay Packers Minnesota Vikings 7-23 Final
Sep 11, 2022 New York Giants Tennessee Titans 21-20 Final
Sep 11, 2022 Las Vegas Raiders Los Angeles Chargers 19-24 Final
Sep 11, 2022 Tampa Bay Buccaneers Dallas Cowboys 19-3 Final
Sep 12, 2022 Denver Broncos Seattle Seahawks 16-17 Final