Home > Blockchain >  Table is not displayed with python requests
Table is not displayed with python requests

Time:09-16

There's a website https://www.hockey-reference.com//leagues/NHL_2022.html I need to get table in div with id=div_stats

from bs4 import BeautifulSoup

url = 'https://www.hockey-reference.com/leagues/NHL_2022.html'


r = requests.get(url=url)
soup = BeautifulSoup(r.text, 'html.parser')
table = soup.find('div', id='div_stats')
print(table)
#None

Response is 200, but there's no such div in BeautifulSoup object. If I open the page using selenium or manually - it gets loaded properly.

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep

url = 'https://www.hockey-reference.com/leagues/NHL_2022.html'

with webdriver.Chrome() as browser:
    browser.get(url)
    #sleep(1)
    html = browser.page_source

#r = requests.get(url=url, stream=True)

soup = BeautifulSoup(html, 'html.parser')

table = soup.find_all('div', id='div_stats')

However, while using webdriver it may load page for quite a long time (even if I see the whole page, it's still loading browser.get(url), and the code couldn't continue). Is there any solution that can help avoiding selenium / stop the loading when the table is in the HTML? I tried: stream and timeout in requests.get(),

        for season in seasons:
            browser.get(url)
            wait = WebDriverWait(browser, 5)
            wait.until(EC.visibility_of_element_located((By.ID, 'div_stats')))
            html = browser.execute_script('return document.documentElement.outerHTML')

Nothing of that worked. Bless the one who could provide the proper solution to this

CodePudding user response:

This is one way to get that table as a dataframe:

import pandas as pd
import requests
from bs4 import BeautifulSoup as bs

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
}

url= 'https://www.hockey-reference.com//leagues/NHL_2022.html'
response = requests.get(url).text.replace('<!--', '').replace('-->', '')
table = bs(response, 'html.parser')
table_w_data = soup.select_one('table#stats')
dfs = pd.read_html(str(table_w_data))[0]
print(df)

Result in terminal:

0_level_0   Unnamed: 1_level_0  Unnamed: 2_level_0  Unnamed: 3_level_0  Unnamed: 4_level_0  Unnamed: 5_level_0  Unnamed: 6_level_0  Unnamed: 7_level_0  Unnamed: 8_level_0  Unnamed: 9_level_0  ... Special Teams   Shot Data   Unnamed: 31_level_0
Rk  Unnamed: 1_level_1  AvAge   GP  W   L   OL  PTS PTS%    GF  ... PK% SH  SHA PIM/G   oPIM/G  S   S%  SA  SV% SO
0   1.0 Florida Panthers*   27.8    82  58  18  6   122 0.744   337 ... 79.54   12  8   10.1    10.8    3062    11.0    2515    0.904   5
1   2.0 Colorado Avalanche* 28.2    82  56  19  7   119 0.726   308 ... 79.66   6   5   9.0 10.4    2874    10.7    2625    0.912   7
2   3.0 Carolina Hurricanes*    28.3    82  54  20  8   116 0.707   277 ... 88.04   4   3   9.2 7.7 2798    9.9 2310    0.913   6
3   4.0 Toronto Maple Leafs*    28.4    82  54  21  7   115 0.701   312 ... 82.05   13  4   8.6 8.5 2835    11.0    2511    0.900   7
4   5.0 Minnesota Wild* 29.4    82  53  22  7   113 0.689   305 ... 76.14   2   5   10.8    10.8    2666    11.4    2577    0.903   3
5   6.0 Calgary Flames* 28.8    82  50  21  11  111 0.677   291 ... 83.20   7   3   9.1 8.6 2908    10.0    2374    0.913   11
6   7.0 Tampa Bay Lightning*    29.6    82  51  23  8   110 0.671   285 ... 80.56   7   5   11.0    11.4    2535    11.2    2441    0.907   3
7   8.0 New York Rangers*   26.7    82  52  24  6   110 0.671   250 ... 82.30   8   2   8.2 8.2 2392    10.5    2528    0.919   9
8   9.0 St. Louis Blues*    28.8    82  49  22  11  109 0.665   309 ... 84.09   9   5   7.5 7.9 2492    12.4    2591    0.908   4
9   10.0    Boston Bruins*  28.5    82  51  26  5   107 0.652   253 ... 81.30   5   6   9.9 9.4 2962    8.5 2354    0.907   4
10  11.0    Edmonton Oilers*    29.1    82  49  27  6   104 0.634   285 ... 79.37   11  6   8.1 7.1 2790    10.2    2647    0.905   4
11  12.0    Pittsburgh Penguins*    29.7    82  46  25  11  103 0.628   269 ... 84.43   3   8   6.9 8.4 2849    9.4 2576    0.914   7
12  13.0    Washington Capitals*    29.5    82  44  26  12  100 0.610   270 ... 80.44   8   9   7.7 8.8 2577    10.5    2378    0.898   8
13  14.0    Los Angeles Kings*  28.0    82  44  27  11  99  0.604   235 ... 76.65   11  9   7.7 8.3 2865    8.2 2341    0.901   5
14  15.0    Dallas Stars*   29.4    82  46  30  6   98  0.598   233 ... 79.00   7   5   6.7 7.5 2486    9.4 2545    0.904   2
15  16.0    Nashville Predators*    27.7    82  45  30  7   97  0.591   262 ... 79.23   2   5   12.6    11.9    2439    10.7    2646    0.906   4
16  17.0    Vegas Golden Knights    28.5    82  43  31  8   94  0.573   262 ... 77.40   10  7   7.6 7.7 2830    9.3 2458    0.901   3
17  18.0    Vancouver Canucks   27.7    82  40  30  12  92  0.561   246 ... 74.89   5   6   8.0 8.6 2622    9.4 2612    0.912   1
18  19.0    Winnipeg Jets   28.2    82  39  32  11  89  0.543   250 ... 75.00   9   8   8.8 9.5 2645    9.5 2721    0.907   5
19  20.0    New York Islanders  30.1    82  37  35  10  84  0.512   229 ... 84.19   5   7   8.9 8.4 2367    9.7 2669    0.913   9
20  21.0    Columbus Blue Jackets   26.6    82  37  38  7   81  0.494   258 ... 78.57   7   6   7.7 7.2 2463    10.5    2887    0.897   2
21  22.0    San Jose Sharks 29.0    82  32  37  13  77  0.470   211 ... 85.20   4   11  8.8 8.6 2400    8.8 2622    0.900   3
22  23.0    Anaheim Ducks   27.9    82  31  37  14  76  0.463   228 ... 80.80   6   4   9.3 9.8 2393    9.5 2725    0.902   4
23  24.0    Buffalo Sabres  27.5    82  32  39  11  75  0.457   229 ... 76.42   6   6   8.1 7.9 2451    9.3 2702    0.894   1
24  25.0    Detroit Red Wings   26.9    82  32  40  10  74  0.451   227 ... 73.78   4   10  8.9 8.5 2414    9.4 2761    0.888   4
25  26.0    Ottawa Senators 26.6    82  33  42  7   73  0.445   224 ... 80.32   9   4   10.0    10.2    2463    9.1 2740    0.904   2
26  27.0    Chicago Blackhawks  28.0    82  28  42  12  68  0.415   213 ... 76.23   2   6   7.9 8.7 2362    9.0 2703    0.893   4
27  28.0    New Jersey Devils   25.8    82  27  46  9   63  0.384   245 ... 80.19   6   14  8.1 8.4 2562    9.6 2540    0.881   2
28  29.0    Philadelphia Flyers 28.3    82  25  46  11  61  0.372   210 ... 75.74   6   11  9.0 9.0 2539    8.3 2785    0.894   1
29  30.0    Seattle Kraken  28.7    82  27  49  6   60  0.366   213 ... 74.89   8   7   8.5 8.0 2380    8.9 2367    0.880   3
30  31.0    Arizona Coyotes 28.0    82  25  50  7   57  0.348   206 ... 75.00   3   4   10.2    8.2 2121    9.7 2910    0.894   1
31  32.0    Montreal Canadiens  27.8    82  22  49  11  55  0.335   218 ... 75.55   6   12  10.2    9.0 2442    8.9 2823    0.888   3
32  NaN League Average  28.2    82  41  32  9   91  0.555   255 ... 79.39   7   7   8.9 8.9 2593    9.8 2593    0.902   4
33 rows × 32 columns

Expect to do a little cleanup of that data, once you get it. Relevant documentation for pandas: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html

And for requests: https://requests.readthedocs.io/en/latest/

And for BeautifulSoup: https://beautiful-soup-4.readthedocs.io/en/latest/index.html

  • Related