Home > Back-end >  How to navigate the parse tree when the scraped content doesn't match the web page content
How to navigate the parse tree when the scraped content doesn't match the web page content

Time:08-24

I want to scrape the data of the players table for my own personal use on this link: https://fbref.com/en/comps/9/stats/Premier-League-Stats

However, no matter how I try to navigate the parse tree, I can never seem to access the actual table statistics part of the html for the players.

Web page html for player stats

The id tag in the web page for the table is id="div_stats_standard". When I look for this in the soup in my Jupyter Notebook code using the code:

import requests
from bs4 import BeautifulSoup
url = "https://fbref.com/en/comps/9/stats/Premier-League-Stats"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
table = soup.find_all(id= "div_stats_standard")
print(table)

I get the output:

[]

Even stranger, when I scroll down through the soup in my code to the part where the tag exists in the web page html, it's not there?? I have it marked out where the id tag should be in the image below. Can anyone help me with this please?

Web scraping code

CodePudding user response:

This is how you can obtain the tables on that page (I imagine the one you're looking for is the last dataframe):

import pandas as pd
import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
}

url= 'https://fbref.com/en/comps/9/stats/Premier-League-Stats'
response = requests.get(url).text.replace('<!--', '').replace('-->', '')
dfs = pd.read_html(response)
for df in dfs:
    print(df)

This will return the available tables on the page:

Squad   # Pl    Age Poss    MP  Starts  Min 90s Gls Ast G-PK    PK  PKatt   CrdY    CrdR    Gls Ast G A G-PK    G A-PK  xG  npxG    xA  npxG xA xG  xA  xG xA   npxG    npxG xA
0   Arsenal 16  24.6    51.0    3   33  270 3.0 8   6   8   0   0   4   0   2.67    2.00    4.67    2.67    4.67    5.4 5.4 3.1 8.5 1.80    1.04    2.83    1.80    2.83
1   Aston Villa 18  26.8    57.7    3   33  270 3.0 3   3   3   0   0   8   0   1.00    1.00    2.00    1.00    2.00    3.5 3.5 2.9 6.4 1.17    0.97    2.14    1.17    2.14
2   Bournemouth 18  26.2    36.3    3   33  270 3.0 2   1   2   0   0   8   0   0.67    0.33    1.00    0.67    1.00    0.9 0.9 0.5 1.5 0.30    0.18    0.48    0.30    0.48
3   Brentford   19  26.2    44.7    3   33  270 3.0 8   6   8   0   0   2   0   2.67    2.00    4.67    2.67    4.67    3.7 3.7 3.0 6.8 1.24    1.01    2.25    1.24    2.25
4   Brighton    17  28.0    47.7    3   33  270 3.0 4   2   3   1   1   3   0   1.33    0.67    2.00    1.00    1.67    2.6 2.6 1.9 4.5 0.86    0.64    1.50    0.86    1.50
5   Chelsea 17  28.1    62.3    3   33  270 3.0 3   2   2   1   1   8   1   1.00    0.67    1.67    0.67    1.33    3.1 2.3 1.9 4.3 1.02    0.64    1.66    0.78    1.42
[...]
  • Related