Home > Net >  Returning Text From Specific Class BeatifulSoup4
Returning Text From Specific Class BeatifulSoup4

Time:09-29

I am new to BS4 and trying to web scrape data for a specific HMTL class. A snippet of my HTML data looks like the following

<td >31</td == $0
<td >
   <a href="/boxscores/20220908ram.htm">
      "F"
      <span class =no_mobile">inal</span>
   </a>
</td>

The problem I am having is that when I try to FindAll() for the class "right", I am also seeing the contents of the class "right gamelink". Is there a way to specify that the returned text should only come from the "right" class instead of the "right gamelink" class?

Code:

from bs4 import BeautifulSoup
import requests


weekNumber = 1
url = "https://www.pro-football-reference.com/years/2022/week_" str(weekNumber) ".htm"

print(url)

req = requests.get(url)
webpage = BeautifulSoup(req.text, 'html.parser')

scores = webpage.findAll("td", attrs={'class': 'right'})

for score in scores:
    current_score = score.text.strip()
    print(current_score)

Output:

31
Final

CodePudding user response:

Use css selectors instead - with the format below. So change

scores = webpage.findAll("td", attrs={'class': 'right'})

to

scores = webpage.select('td[]')

and see if it works.

CodePudding user response:

maybe it will be useful for you

import requests
import pandas as pd
from bs4 import BeautifulSoup


url = 'https://www.pro-football-reference.com/years/2022/week_1.htm'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
results = []
for game in soup.find_all('div', class_='game_summary expanded nohover'):
    teams = []
    for x in game.find('table', class_='teams').find_all('tr'):
        teams.append(list(filter(None, [a.strip() for a in x.get_text().split('\n')])))
    results.append({
        'Date': teams[0][0],
        'Home': {
            'Name': teams[1][0],
            'Score': teams[1][1]
        },
        'Guest': {
            'Name': teams[2][0],
            'Score': teams[2][1]
        },
        'Result': (lambda r: teams[1][-1] if len(teams[2]) < 3 else f'{teams[1][-1]} {teams[2][-1]}')(teams)
    })
df = pd.DataFrame(results)
print(df.to_string(index=False))

OUTPUT:

        Date                                            Home                                            Guest   Result
 Sep 8, 2022        {'Name': 'Buffalo Bills', 'Score': '31'}      {'Name': 'Los Angeles Rams', 'Score': '10'}    Final
Sep 11, 2022   {'Name': 'New Orleans Saints', 'Score': '27'}       {'Name': 'Atlanta Falcons', 'Score': '26'}    Final
Sep 11, 2022     {'Name': 'Cleveland Browns', 'Score': '26'}     {'Name': 'Carolina Panthers', 'Score': '24'}    Final
Sep 11, 2022  {'Name': 'San Francisco 49ers', 'Score': '10'}         {'Name': 'Chicago Bears', 'Score': '19'}    Final
Sep 11, 2022  {'Name': 'Pittsburgh Steelers', 'Score': '23'}    {'Name': 'Cincinnati Bengals', 'Score': '20'} Final OT
Sep 11, 2022  {'Name': 'Philadelphia Eagles', 'Score': '38'}         {'Name': 'Detroit Lions', 'Score': '35'}    Final
Sep 11, 2022   {'Name': 'Indianapolis Colts', 'Score': '20'}        {'Name': 'Houston Texans', 'Score': '20'} Final OT
Sep 11, 2022  {'Name': 'New England Patriots', 'Score': '7'}        {'Name': 'Miami Dolphins', 'Score': '20'}    Final
Sep 11, 2022     {'Name': 'Baltimore Ravens', 'Score': '24'}          {'Name': 'New York Jets', 'Score': '9'}    Final
Sep 11, 2022 {'Name': 'Jacksonville Jaguars', 'Score': '22'} {'Name': 'Washington Commanders', 'Score': '28'}    Final
Sep 11, 2022   {'Name': 'Kansas City Chiefs', 'Score': '44'}     {'Name': 'Arizona Cardinals', 'Score': '21'}    Final
Sep 11, 2022     {'Name': 'Green Bay Packers', 'Score': '7'}     {'Name': 'Minnesota Vikings', 'Score': '23'}    Final
Sep 11, 2022      {'Name': 'New York Giants', 'Score': '21'}      {'Name': 'Tennessee Titans', 'Score': '20'}    Final
Sep 11, 2022    {'Name': 'Las Vegas Raiders', 'Score': '19'}  {'Name': 'Los Angeles Chargers', 'Score': '24'}    Final
Sep 11, 2022 {'Name': 'Tampa Bay Buccaneers', 'Score': '19'}         {'Name': 'Dallas Cowboys', 'Score': '3'}    Final
Sep 12, 2022       {'Name': 'Denver Broncos', 'Score': '16'}      {'Name': 'Seattle Seahawks', 'Score': '17'}    Final

Or u can change dict, to

results.append({
        'Date': teams[0][0],
        'Home Team': teams[1][0],
        'Guest Team': teams[2][0],
        'Score': f'{teams[1][1]}-{teams[2][1]}',
        'Result': (lambda r: teams[1][-1] if len(teams[2]) < 3 else f'{teams[1][-1]} {teams[2][-1]}')(teams)
    })

And ur table now looks like:

        Date            Home Team            Guest Team Score   Result
 Sep 8, 2022        Buffalo Bills      Los Angeles Rams 31-10    Final
Sep 11, 2022   New Orleans Saints       Atlanta Falcons 27-26    Final
Sep 11, 2022     Cleveland Browns     Carolina Panthers 26-24    Final
Sep 11, 2022  San Francisco 49ers         Chicago Bears 10-19    Final
Sep 11, 2022  Pittsburgh Steelers    Cincinnati Bengals 23-20 Final OT
Sep 11, 2022  Philadelphia Eagles         Detroit Lions 38-35    Final
Sep 11, 2022   Indianapolis Colts        Houston Texans 20-20 Final OT
Sep 11, 2022 New England Patriots        Miami Dolphins  7-20    Final
Sep 11, 2022     Baltimore Ravens         New York Jets  24-9    Final
Sep 11, 2022 Jacksonville Jaguars Washington Commanders 22-28    Final
Sep 11, 2022   Kansas City Chiefs     Arizona Cardinals 44-21    Final
Sep 11, 2022    Green Bay Packers     Minnesota Vikings  7-23    Final
Sep 11, 2022      New York Giants      Tennessee Titans 21-20    Final
Sep 11, 2022    Las Vegas Raiders  Los Angeles Chargers 19-24    Final
Sep 11, 2022 Tampa Bay Buccaneers        Dallas Cowboys  19-3    Final
Sep 12, 2022       Denver Broncos      Seattle Seahawks 16-17    Final
  • Related