Home > Software engineering >  How to find div in beautiful soup
How to find div in beautiful soup

Time:08-05

playerstats_url = 'https://www.pro-football-reference.com/boxscores/202110100tam.htm'
for week in weeks:
    url1 = playerstats_url.format(week)
    data1 = requests.get(url1)
    
    with open('player/{}.html'.format(week), 'w ') as f:
        f.write(data1.text)

soup = BeautifulSoup(page, 'html.parser')

week1_stats = soup.find('div', 'id':'team_stats')

tam2021 = pd.read_html(str(week1_stats))[0]

I am trying to pull the 'Team Stats' table from pro football reference website, but I keep getting 'ValueError: No tables found'

CodePudding user response:

This worked for me...

import requests
from bs4 import BeautifulSoup
import pandas as pd

html = requests.get('https://www.pro-football-reference.com/boxscores/202110100tam.htm')
soup = BeautifulSoup(html.text)
stats = soup.find('div', {'id':'all_player_offense'})
pd.read_html(str(stats))

Which returns...

[   Unnamed: 0_level_0 Unnamed: 1_level_0  Passing                                                                          Rushing                             Receiving                                              Fumbles
               Player                 Tm      Cmp      Att      Yds       TD      Int       Sk    Yds.1      Lng     Rate      Att      Yds       TD      Lng        Tgt        Rec        Yds         TD        Lng      Fmb       FL
0     Jacoby Brissett                MIA       27       39      275        2        1        3       13       34     95.6        0        0        0        0          0          0          0          0          0        1        1
1        Myles Gaskin                MIA        0        0        0        0        0        0        0        0      NaN        5       25        0       13         10         10         74          2         24        0        0
2    Preston Williams                MIA        0        0        0        0        0        0        0        0      NaN        1        7        0        7          5          3         60          0         34        0        0
3        Salvon Ahmed                MIA        0        0        0        0        0        0        0        0      NaN        2        5        0        4          3          2         16          0         11        0        0
4       Jaylen Waddle                MIA        0        0        0        0        0        0        0        0      NaN        1        2        0        2          6          2         31          0         21        0        0
5        Mike Gesicki                MIA        0        0        0        0        0        0        0        0      NaN        0        0        0        0          7          4         43          0         23        0        0
6       Durham Smythe                MIA        0        0        0        0        0        0        0        0      NaN        0        0        0        0          3          2         23          0         21        0        0
7        Adam Shaheen                MIA        0        0        0        0        0        0        0        0      NaN        0        0        0        0          2          2         15          0         10        0        0
8        Mack Hollins                MIA        0        0        0        0        0        0        0        0      NaN        0        0        0        0          2          1         10          0         10        0        0
9         Isaiah Ford                MIA        0        0        0        0        0        0        0        0      NaN        0        0        0        0          1          1          3          0          3        0        0
10                NaN                NaN  Passing  Passing  Passing  Passing  Passing  Passing  Passing  Passing  Passing  Rushing  Rushing  Rushing  Rushing  Receiving  Receiving  Receiving  Receiving  Receiving  Fumbles  Fumbles
11             Player                 Tm      Cmp      Att      Yds       TD      Int       Sk      Yds      Lng     Rate      Att      Yds       TD      Lng        Tgt        Rec        Yds         TD        Lng      Fmb       FL
12          Tom Brady                TAM       30       41      411        5        0        2       15       62    144.4        1       13        0       13          0          0          0          0          0        0        0
13     Blaine Gabbert                TAM        3        3       41        0        0        0        0       23    118.7        3       -1        0        0          0          0          0          0          0        0        0
14  Leonard Fournette                TAM        0        0        0        0        0        0        0        0      NaN       12       67        1       17          5          4         43          0         16        0        0
15    Ronald Jones II                TAM        0        0        0        0        0        0        0        0      NaN        5       21        0        5          1          1         15          0         15        0        0
16    Giovani Bernard                TAM        0        0        0        0        0        0        0        0      NaN        4       21        0       17          2          2         14          1         10        0        0
17      Antonio Brown                TAM        0        0        0        0        0        0        0        0      NaN        0        0        0        0          8          7        124          2         62        0        0
18         Mike Evans                TAM        0        0        0        0        0        0        0        0      NaN        0        0        0        0          8          6        113          2         34        0        0
19       Chris Godwin                TAM        0        0        0        0        0        0        0        0      NaN        0        0        0        0         11          7         70          0         18        0        0
20      Tyler Johnson                TAM        0        0        0        0        0        0        0        0      NaN        0        0        0        0          3          3         42          0         19        0        0
21        O.J. Howard                TAM        0        0        0        0        0        0        0        0      NaN        0        0        0        0          3          2         19          0         10        0        0
22      Cameron Brate                TAM        0        0        0        0        0        0        0        0      NaN        0        0        0        0          1          1         12          0         12        0        0]

EDIT FOR UPDATED QUESTION

Found that the table is commented after parsing using requests and bs4. I think the one on the site is dynamically loaded and the requests library cannot handle pages that uses JavaScript to request info.

The solution below works perfectly fine but if you want the info that's dynamically loaded, possibly try using this library instead: https://pypi.org/project/requests-html/

import requests
from bs4 import BeautifulSoup
import pandas as pd

html = requests.get('https://www.pro-football-reference.com/boxscores/202110100tam.htm')
data = html.text.replace('<!--','').replace('-->','')
soup = BeautifulSoup(data)
stats = soup.find('div', {'id':'div_team_stats'})
pd.read_html(str(stats))

This returns...

[            Unnamed: 0            MIA            TAM
0          First Downs             17             33
1         Rush-Yds-TDs         9-39-0       25-121-1
2    Cmp-Att-Yd-TD-INT  27-39-275-2-1  33-44-452-5-0
3         Sacked-Yards           3-13           2-15
4       Net Pass Yards            262            437
5          Total Yards            301            558
6         Fumbles-Lost            1-1            0-0
7            Turnovers              2              0
8      Penalties-Yards           5-37           6-47
9     Third Down Conv.            2-7           8-11
10   Fourth Down Conv.            0-0            0-0
11  Time of Possession          22:53          37:07]
  • Related