Home > Blockchain >  How do I get all the tables from a website using pandas
How do I get all the tables from a website using pandas

Time:11-28

I am trying to get 3 tables from a particular website but only the first two are showing up. I have even tried get the data using BeautifulSoup but the third seems to be hidden somehow. Is there something I am missing?

url = "https://fbref.com/en/comps/9/keepersadv/Premier-League-Stats"
html = pd.read_html(url, header=1)
print(html[0])
print(html[1])
print(html[2]) # This prompts an error that the tables does not exist

The first two tables are the squad tables. The table not showing up is the individual player table. This also happens with similar pages from the same site.

CodePudding user response:

You could use Selenium as suggested, but I think is a bit overkill. The table is available in the static HTML, just within the comments. So you would need to pull the comments out of BeautifulSoup to get those tables.

To get all the tables:

import pandas as pd
import requests
from bs4 import BeautifulSoup, Comment

url = 'https://fbref.com/en/comps/9/keepersadv/Premier-League-Stats'
response = requests.get(url)

tables = pd.read_html(response.text, header=1)

# Get the tables within the Comments
soup = BeautifulSoup(response.text, 'html.parser')
comments = soup.find_all(string=lambda text: isinstance(text, Comment))
for each in comments:
    if 'table' in str(each):
        try:
            table = pd.read_html(str(each), header=1)[0]
            table = table[table['Rk'].ne('Rk')].reset_index(drop=True)
            tables.append(table)
        except:
            continue

Output:

for table in tables:
    print(table)


              Squad  # Pl   90s  GA  PKA  ...  Stp  Stp%  #OPA  #OPA/90  AvgDist
0           Arsenal     2  12.0  17    0  ...   10   8.8     6     0.50     14.6
1       Aston Villa     2  12.0  20    0  ...    6   6.8    13     1.08     16.2
2         Brentford     2  12.0  17    1  ...   10   9.9    18     1.50     15.6
3          Brighton     2  12.0  14    2  ...   17  16.2    13     1.08     15.3
4           Burnley     1  12.0  20    0  ...   14  11.7    17     1.42     16.6
5           Chelsea     2  12.0   4    2  ...    8   8.5     5     0.42     14.0
6    Crystal Palace     1  12.0  17    0  ...    7   7.5     6     0.50     13.5
7           Everton     2  12.0  19    0  ...    8   7.4     7     0.58     13.7
8      Leeds United     1  12.0  20    1  ...    8  12.5    15     1.25     16.3
9    Leicester City     1  12.0  21    2  ...    9   8.4     7     0.58     13.0
10        Liverpool     2  12.0  11    0  ...    9   9.7    16     1.33     17.0
11  Manchester City     2  12.0   6    1  ...    5   8.1    16     1.33     17.5
12   Manchester Utd     1  12.0  21    0  ...    4   4.4     2     0.17     13.3
13    Newcastle Utd     2  12.0  27    4  ...   10   9.8     4     0.33     13.9
14     Norwich City     1  12.0  27    2  ...    6   5.1     5     0.42     12.4
15      Southampton     1  12.0  14    0  ...   16  13.9     2     0.17     12.9
16        Tottenham     1  12.0  17    1  ...    3   2.7     5     0.42     14.1
17          Watford     2  12.0  20    1  ...    6   5.5     9     0.75     15.4
18         West Ham     1  12.0  14    0  ...    6   5.3     1     0.08     11.9
19           Wolves     1  12.0  12    3  ...    9  10.0    10     0.83     15.5

[20 rows x 28 columns]
                 Squad  # Pl   90s  GA  PKA  ...  Stp  Stp%  #OPA  #OPA/90  AvgDist
0           vs Arsenal     2  12.0  13    0  ...    4   5.9    11     0.92     15.5
1       vs Aston Villa     2  12.0  16    2  ...   11   8.0     7     0.58     14.8
2         vs Brentford     2  12.0  16    1  ...   16  14.0     9     0.75     15.7
3          vs Brighton     2  12.0  12    3  ...   11  12.5     8     0.67     15.9
4           vs Burnley     1  12.0  14    0  ...   16  10.7    12     1.00     15.1
5           vs Chelsea     2  12.0  30    2  ...   10  11.1    11     0.92     14.2
6    vs Crystal Palace     1  12.0  18    2  ...    7   7.2     9     0.75     14.4
7           vs Everton     2  12.0  16    3  ...    7   7.6     7     0.58     13.8
8      vs Leeds United     1  12.0  12    1  ...    8   7.3     5     0.42     14.2
9    vs Leicester City     1  12.0  16    0  ...    2   3.3     7     0.58     14.3
10        vs Liverpool     2  12.0  35    1  ...   12   9.9    14     1.17     13.7
11  vs Manchester City     2  12.0  25    0  ...    8   6.7     4     0.33     13.1
12   vs Manchester Utd     1  12.0  20    0  ...    7   7.8     7     0.58     14.7
13    vs Newcastle Utd     2  12.0  15    0  ...    8   8.0     8     0.67     15.3
14     vs Norwich City     1  12.0   7    2  ...    5   5.7    16     1.33     17.3
15      vs Southampton     1  12.0  11    2  ...    4   3.7     9     0.75     14.0
16        vs Tottenham     1  12.0  11    1  ...    9  12.2     9     0.75     16.0
17          vs Watford     2  12.0  16    0  ...    8   8.2     9     0.75     15.3
18         vs West Ham     1  12.0  23    0  ...   13  10.5     6     0.50     13.8
19           vs Wolves     1  12.0  12    0  ...    5   6.8     9     0.75     15.3

[20 rows x 28 columns]
    Rk             Player   Nation Pos  ... #OPA #OPA/90 AvgDist  Matches
0    1            Alisson   br BRA  GK  ...   15    1.36    17.1  Matches
1    2  Kepa Arrizabalaga   es ESP  GK  ...    1    1.00    18.8  Matches
2    3    Daniel Bachmann   at AUT  GK  ...    1    0.25    12.2  Matches
3    4      Asmir Begović   ba BIH  GK  ...    0    0.00    15.0  Matches
4    5        Karl Darlow  eng ENG  GK  ...    4    0.50    14.9  Matches
5    6            Ederson   br BRA  GK  ...   14    1.27    17.5  Matches
6    7   Łukasz Fabiański   pl POL  GK  ...    1    0.08    11.9  Matches
7    8   Álvaro Fernández   es ESP  GK  ...    5    1.67    15.3  Matches
8    9         Ben Foster  eng ENG  GK  ...    8    1.00    16.8  Matches
9   10       David de Gea   es ESP  GK  ...    2    0.17    13.3  Matches
10  11     Vicente Guaita   es ESP  GK  ...    6    0.50    13.5  Matches
11  12  Caoimhín Kelleher   ie IRL  GK  ...    1    1.00    14.6  Matches
12  13           Tim Krul   nl NED  GK  ...    5    0.42    12.4  Matches
13  14         Bernd Leno   de GER  GK  ...    1    0.33    13.1  Matches
14  15        Hugo Lloris   fr FRA  GK  ...    5    0.42    14.1  Matches
15  16  Emiliano Martínez   ar ARG  GK  ...   12    1.09    16.4  Matches
16  17      Alex McCarthy  eng ENG  GK  ...    2    0.17    12.9  Matches
17  18      Edouard Mendy   sn SEN  GK  ...    4    0.36    13.3  Matches
18  19      Illan Meslier   fr FRA  GK  ...   15    1.25    16.3  Matches
19  20    Jordan Pickford  eng ENG  GK  ...    7    0.64    13.6  Matches
20  21          Nick Pope  eng ENG  GK  ...   17    1.42    16.6  Matches
21  22     Aaron Ramsdale  eng ENG  GK  ...    5    0.56    14.9  Matches
22  23         David Raya   es ESP  GK  ...   13    1.44    15.7  Matches
23  24            José Sá   pt POR  GK  ...   10    0.83    15.5  Matches
24  25     Robert Sánchez   es ESP  GK  ...   13    1.18    15.4  Matches
25  26  Kasper Schmeichel   dk DEN  GK  ...    7    0.58    13.0  Matches
26  27       Jason Steele  eng ENG  GK  ...    0    0.00    13.0  Matches
27  28          Jed Steer  eng ENG  GK  ...    1    1.00    14.3  Matches
28  29       Zack Steffen   us USA  GK  ...    2    2.00    17.8  Matches
29  30    Freddie Woodman  eng ENG  GK  ...    0    0.00    11.6  Matches

[30 rows x 34 columns]

CodePudding user response:

The player table is loaded with JavaScript, so it's not available in the static HTML.

See chitown88's answer: It turns out the table is available in the static HTML, just within the comments.


Here is another way using selenium-python:

pip install selenium
  1. Scrape the id="stats_keeper_adv" table
  2. rename the unnamed columns
  3. Drop the repeated headers using loc
from selenium import webdriver

url = 'https://fbref.com/en/comps/9/keepersadv/Premier-League-Stats'

with webdriver.Chrome() as driver:
    driver.get(url)
    table = driver.find_element(by='xpath', value='//table[@id="stats_keeper_adv"]/..')
    html = table.get_attribute('innerHTML')
    df = pd.read_html(html)[0]

# rename unnamed columns
df = df.rename(columns=lambda x: '' if x.startswith('Unnamed') else x)

# ignore repeated headers
df = df.loc[df[('', 'Rk')] != 'Rk']

Output:

                                                                            Goal Kicks                Crosses           Sweeper          AvgDist
    Rk             Player   Nation Pos            Squad     Age  Born  ...         Att Launch% AvgLen     Opp Stp  Stp%    #OPA #OPA/90     17.1   
0    1            Alisson   br BRA  GK        Liverpool  29-056  1992  ...          59    47.5   40.6      90   9  10.0      15    1.36     18.8   
1    2  Kepa Arrizabalaga   es ESP  GK          Chelsea  27-055  1994  ...           4     0.0    9.3       8   1  12.5       1    1.00     12.2   
2    3    Daniel Bachmann   at AUT  GK          Watford  27-141  1994  ...          35    34.3   36.2      38   2   5.3       1    0.25     15.0   
3    4      Asmir Begović   ba BIH  GK          Everton  34-160  1987  ...          12    66.7   49.6       5   1  20.0       0    0.00     14.9   
4    5        Karl Darlow  eng ENG  GK    Newcastle Utd  31-050  1990  ...          64    78.1   59.8      69   8  11.6       4    0.50     17.5   
5    6            Ederson   br BRA  GK  Manchester City  28-102  1993  ...          44    25.0   33.6      56   5   8.9      14    1.27     11.9   
6    7   Łukasz Fabiański   pl POL  GK         West Ham  36-223  1985  ...          93    69.9   53.7     113   6   5.3       1    0.08     15.3   
7    8   Álvaro Fernández   es ESP  GK        Brentford  23-228  1998  ...          19    42.1   34.4      27   1   3.7       5    1.67     16.8   
8    9         Ben Foster  eng ENG  GK          Watford  38-238  1983  ...          69    87.0   63.9      72   4   5.6       8    1.00     13.3   
9   10       David de Gea   es ESP  GK   Manchester Utd  31-020  1990  ...          99    44.4   38.8      91   4   4.4       2    0.17     13.5   
10  11     Vicente Guaita   es ESP  GK   Crystal Palace  34-321  1987  ...          79    51.9   38.1      93   7   7.5       6    0.50     14.6   
11  12  Caoimhín Kelleher   ie IRL  GK        Liverpool  23-004  1998  ...           5    20.0   20.6       3   0   0.0       1    1.00     12.4   
12  13           Tim Krul   nl NED  GK     Norwich City  33-238  1988  ...         104    58.7   47.5     117   6   5.1       5    0.42     13.1   
13  14         Bernd Leno   de GER  GK          Arsenal  29-268  1992  ...          26    61.5   45.7      30   2   6.7       1    0.33     14.1   
14  15        Hugo Lloris   fr FRA  GK        Tottenham  34-336  1986  ...         104    53.8   41.2     110   3   2.7       5    0.42     16.4   
15  16  Emiliano Martínez   ar ARG  GK      Aston Villa  29-086  1992  ...          87    48.3   41.2      80   5   6.3      12    1.09     12.9   
16  17      Alex McCarthy  eng ENG  GK      Southampton  31-359  1989  ...          85    74.1   55.7     115  16  13.9       2    0.17     13.3   
17  18      Edouard Mendy   sn SEN  GK          Chelsea  29-271  1992  ...          67    31.3   29.3      86   7   8.1       4    0.36     16.3   
18  19      Illan Meslier   fr FRA  GK     Leeds United  21-270  2000  ...         100    32.0   32.5      64   8  12.5      15    1.25     13.6   
19  20    Jordan Pickford  eng ENG  GK          Everton  27-265  1994  ...          91    80.2   64.5     103   7   6.8       7    0.64     16.6   
20  21          Nick Pope  eng ENG  GK          Burnley  29-222  1992  ...          95    90.5   65.7     120  14  11.7      17    1.42     14.9   
21  22     Aaron Ramsdale  eng ENG  GK          Arsenal  23-197  1998  ...          66    74.2   57.2      83   8   9.6       5    0.56     15.7   
22  23         David Raya   es ESP  GK        Brentford  26-073  1995  ...          77    71.4   52.8      74   9  12.2      13    1.44     15.5   
23  24            José Sá   pt POR  GK           Wolves  28-314  1993  ...          81    56.8   46.7      90   9  10.0      10    0.83     15.4   
24  25     Robert Sánchez   es ESP  GK         Brighton  24-009  1997  ...          69    68.1   54.6      93  16  17.2      13    1.18     13.0   
26  26  Kasper Schmeichel   dk DEN  GK   Leicester City  35-022  1986  ...         112    46.4   40.1     107   9   8.4       7    0.58     13.0   
27  27       Jason Steele  eng ENG  GK         Brighton  31-101  1990  ...           6    50.0   43.8      11   1   9.1       0    0.00     14.3   
28  28          Jed Steer  eng ENG  GK      Aston Villa  29-065  1992  ...           6    66.7   53.8       8   1  12.5       1    1.00     17.8   
29  29       Zack Steffen   us USA  GK  Manchester City  26-239  1995  ...           7    28.6   25.3       6   0   0.0       2    2.00     11.6   
30  30    Freddie Woodman  eng ENG  GK    Newcastle Utd  24-268  1997  ...          43    65.1   52.0      33   2   6.1       0    0.00  

[30 rows x 34 columns]
  • Related