Home > Enterprise >  BeautifulSoup and Selenium won't retrieve full html from website
BeautifulSoup and Selenium won't retrieve full html from website

Time:02-18

This is the site I'm trying to retrieve information from: https://www.baseball-reference.com/boxes/CLE/CLE202108120.shtml I want to get the box score data so like the Oakland A's total batting average in the game, at bats in the game, etc. However, when I retreive and print the html from the site, these box scores are missing completely from the html. Any suggestions? Thanks. Here's my code:

from bs4 import BeautifulSoup
import requests

url = "https://www.baseball-reference.com/boxes/CLE/CLE202108120.shtml"

page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')

print(soup.prettify)

Please help! Thanks! I tried selenium and had the same problem.

CodePudding user response:

The page is loaded by javascript. Try using the requests_html package instead. See below sample.

from bs4 import BeautifulSoup
from requests_html import HTMLSession

url = "https://www.baseball-reference.com/boxes/CLE/CLE202108120.shtml"

s = HTMLSession()

page = s.get(url, timeout=20)
page.html.render()

soup = BeautifulSoup(page.html.html, 'html.parser')

print(soup.prettify)

CodePudding user response:

The other tables are there in the requested html, but within the comments. So you need to parse out the comments to get those additional tables:

import requests
from bs4 import BeautifulSoup, Comment
import pandas as pd

url = "https://www.baseball-reference.com/boxes/CLE/CLE202108120.shtml"
result = requests.get(url).text
data = BeautifulSoup(result, 'html.parser')

comments = data.find_all(string=lambda text: isinstance(text, Comment))

tables = pd.read_html(url)
for each in comments:
    if 'table' in str(each):
        try:
            tables.append(pd.read_html(str(each))[0])
        except:
            continue

Output:

Oakland

print(tables[2].to_string())
                   Batting    AB     R     H   RBI    BB   SO    PA     BA    OBP    SLG    OPS    Pit    Str    WPA   aLI   WPA      WPA-    cWPA  acLI  RE24    PO     A   Details
0            Mark Canha LF   6.0   1.0   1.0   3.0   0.0  0.0   6.0  0.247  0.379  0.415  0.793   23.0   19.0  0.011  0.58  0.040  -0.029%   0.01%  1.02   1.0   1.0   0.0        2B
1        Starling Marte CF   3.0   0.0   2.0   3.0   0.0  1.0   4.0  0.325  0.414  0.476  0.889   12.0    7.0  0.116  0.90  0.132  -0.016%   0.12%  1.57   2.8   1.0   0.0    2B,HBP
2   Stephen Piscotty PH-RF   1.0   0.0   1.0   2.0   0.0  0.0   2.0  0.211  0.272  0.349  0.622    7.0    3.0  0.000  0.00  0.000   0.000%      0%  0.00   2.0   1.0   0.0       HBP
3            Matt Olson 1B   6.0   0.0   1.0   2.0   0.0  0.0   6.0  0.283  0.376  0.566  0.941   21.0   13.0 -0.057  0.45  0.008  -0.065%  -0.06%  0.78  -0.6   9.0   1.0       GDP
4        Mitch Moreland DH   5.0   3.0   2.0   2.0   0.0  0.0   6.0  0.230  0.290  0.415  0.705   23.0   16.0  0.049  0.28  0.064  -0.015%   0.05%  0.50   1.5   NaN   NaN  2·HR,HBP
5         Josh Harrison 2B   0.0   1.0   0.0   0.0   1.0  0.0   1.0  0.294  0.366  0.435  0.801    7.0    3.0  0.057  1.50  0.057   0.000%   0.06%  2.63   0.6   0.0   0.0       NaN
6             Tony Kemp 2B   4.0   3.0   3.0   0.0   1.0  0.0   5.0  0.252  0.370  0.381  0.751   16.0   10.0 -0.001  0.14  0.009  -0.010%      0%  0.24   1.6   2.0   2.0       NaN
7            Sean Murphy C   4.0   3.0   2.0   2.0   2.0  1.0   6.0  0.224  0.318  0.419  0.737   25.0   15.0  0.143  0.38  0.151  -0.007%   0.15%  0.67   2.7   7.0   0.0        2B
8          Matt Chapman 3B   1.0   3.0   0.0   0.0   5.0  1.0   6.0  0.214  0.310  0.365  0.676   31.0   10.0  0.051  0.28  0.051   0.000%   0.05%  0.49   2.2   1.0   3.0       NaN
9         Seth Brown RF-CF   5.0   1.0   1.0   1.0   0.0  1.0   6.0  0.204  0.278  0.451  0.730   18.0   12.0 -0.067  0.40  0.000  -0.067%  -0.07%  0.70  -1.7   4.0   0.0        SF
10         Elvis Andrus SS   5.0   2.0   1.0   2.0   1.0  0.0   6.0  0.233  0.283  0.310  0.593   20.0   15.0  0.015  0.42  0.050  -0.034%   0.02%  0.73  -0.1   0.0   4.0       NaN
11                     NaN   NaN   NaN   NaN   NaN   NaN  NaN   NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN   NaN    NaN      NaN     NaN   NaN   NaN   NaN   NaN       NaN
12         Chris Bassitt P   NaN   NaN   NaN   NaN   NaN  NaN   NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN   NaN    NaN      NaN     NaN   NaN   NaN   1.0   0.0       NaN
13              A.J. Puk P   NaN   NaN   NaN   NaN   NaN  NaN   NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN   NaN    NaN      NaN     NaN   NaN   NaN   0.0   0.0       NaN
14         Deolis Guerra P   NaN   NaN   NaN   NaN   NaN  NaN   NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN   NaN    NaN      NaN     NaN   NaN   NaN   0.0   0.0       NaN
15          Jake Diekman P   NaN   NaN   NaN   NaN   NaN  NaN   NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN   NaN    NaN      NaN     NaN   NaN   NaN   0.0   0.0       NaN
16             Team Totals  40.0  17.0  14.0  17.0  10.0  4.0  54.0  0.350  0.500  0.575  1.075  203.0  123.0  0.317  0.41  0.562  -0.243%   0.33%  0.72  12.2  27.0  10.0       NaN
  • Related