Home > Back-end >  Web scraping multiple tables from a single webpage
Web scraping multiple tables from a single webpage

Time:06-22

Looking for help! I found some code for a problem similar to my own. From a high level, I am looking to scrape multiple tables from the same webpage (for instance 'per game' and 'totals'.

Not sure if it matters, but I am using JupyterLab for this activity. I have very limited knowledge writing in Python (but trying to learn!) so I am having trouble tweaking to get what I want out of either of these websites:

https://www.sports-reference.com/cbb/players/jaden-ivey-1.html

or

https://basketball.realgm.com/player/Jaden-Ivey/Summary/148740

Essentially, this code below works for the fbref webpage but when I replace that source link with either of the above two sites above, I can't figure out how to get what I want.

import requests
from bs4 import BeautifulSoup, Comment


url = 'https://fbref.com/en/comps/9/stats/Premier-League-Stats'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

table = BeautifulSoup(soup.select_one('#all_stats_standard').find_next(text=lambda x: isinstance(x, Comment)), 'html.parser')

#print some information from the table to screen:
for tr in table.select('tr:has(td)'):
    tds = [td.get_text(strip=True) for td in tr.select('td')]
    print('{:<30}{:<20}{:<10}'.format(tds[0], tds[3], tds[5]))

I know there are similar questions on stackoverflow, so I aplogize if this is considered a duplicate request but I need further assistance since I'm new to this.

Thanks, Tim

CodePudding user response:

You can apply pandas to pull those tables data easily.

import pandas as pd
df =pd.read_html('https://www.sports-reference.com/cbb/players/jaden-ivey-1.html')[0:5]
print(df)

Output:

[    Season  School     Conf   G  GS    MP   FG  ...  STL  BLK  TOV   PF   PTS  Unnamed: 27   
 SOS
0  2020-21  Purdue  Big Ten  23  12  24.2  3.9  ...  0.7  0.7  1.3  1.7  11.1          NaN  11.23
1  2021-22  Purdue  Big Ten  36  34  31.4  5.6  ...  0.9  0.6  2.6  1.8  17.3          NaN   8.23
2   Career  Purdue      NaN  59  46  28.6  4.9  ...  0.8  0.6  2.1  1.7  14.9          NaN   9.73

[3 rows x 29 columns],     Season  School     Conf   G  GS    MP   FG   FGA  ...  DRB  TRB  AST  STL  BLK  TOV   PF   PTS
0  2020-21  Purdue  Big Ten  19  10  23.3  3.5   9.2  ...  2.7  3.6  2.1  0.8  0.7  1.4  1.6  
10.3
1  2021-22  Purdue  Big Ten  19  17  32.6  5.5  12.8  ...  3.3  4.2  2.9  0.9  0.5  2.5  1.9  
17.5
2   Career  Purdue      NaN  38  27  27.9  4.5  11.0  ...  3.0  3.9  2.5  0.9  0.6  1.9  1.8  
13.9

[3 rows x 27 columns],     Season  School     Conf   G  GS    MP   FG  FGA  ...  DRB  TRB  AST  STL  BLK  TOV   PF  PTS
0  2020-21  Purdue  Big Ten  23  12   557   89  223  ...   57   76   43   17   16   31   39  256
1  2021-22  Purdue  Big Ten  36  34  1132  203  441  ...  152  176  110   33   20   94   63  624
2   Career  Purdue      NaN  59  46  1689  292  664  ...  209  252  153   50   36  125  102  880

[3 rows x 27 columns],     Season  School     Conf   G  GS    MP   FG  FGA  ...  DRB  TRB  AST  STL  BLK  TOV  PF  PTS
0  2020-21  Purdue  Big Ten  19  10   442   66  174  ...   51   68   39   15   13   26  31  195
1  2021-22  Purdue  Big Ten  19  17   620  104  244  ...   62   79   55   18   10   47  36  333
2   Career  Purdue      NaN  38  27  1062  170  418  ...  113  147   94   33   23   73  67  528

[3 rows x 27 columns],     Season  School     Conf   G  GS    MP   FG  ...  TRB  AST  STL  BLK  TOV   PF   PTS
0  2020-21  Purdue  Big Ten  23  12   557  6.4  ...  5.5  3.1  1.2  1.1  2.2  2.8  18.4       
1  2021-22  Purdue  Big Ten  36  34  1132  7.2  ...  6.2  3.9  1.2  0.7  3.3  2.2  22.0       
2   Career  Purdue      NaN  59  46  1689  6.9  ...  6.0  3.6  1.2  0.9  3.0  2.4  20.8       

[3 rows x 25 columns]]
  • Related