Looking for help! I found some code for a problem similar to my own. From a high level, I am looking to scrape multiple tables from the same webpage (for instance 'per game' and 'totals'.
Not sure if it matters, but I am using JupyterLab for this activity. I have very limited knowledge writing in Python (but trying to learn!) so I am having trouble tweaking to get what I want out of either of these websites:
https://www.sports-reference.com/cbb/players/jaden-ivey-1.html
or
https://basketball.realgm.com/player/Jaden-Ivey/Summary/148740
Essentially, this code below works for the fbref webpage but when I replace that source link with either of the above two sites above, I can't figure out how to get what I want.
import requests
from bs4 import BeautifulSoup, Comment
url = 'https://fbref.com/en/comps/9/stats/Premier-League-Stats'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
table = BeautifulSoup(soup.select_one('#all_stats_standard').find_next(text=lambda x: isinstance(x, Comment)), 'html.parser')
#print some information from the table to screen:
for tr in table.select('tr:has(td)'):
tds = [td.get_text(strip=True) for td in tr.select('td')]
print('{:<30}{:<20}{:<10}'.format(tds[0], tds[3], tds[5]))
I know there are similar questions on stackoverflow, so I aplogize if this is considered a duplicate request but I need further assistance since I'm new to this.
Thanks, Tim
CodePudding user response:
You can apply pandas to pull those tables data easily.
import pandas as pd
df =pd.read_html('https://www.sports-reference.com/cbb/players/jaden-ivey-1.html')[0:5]
print(df)
Output:
[ Season School Conf G GS MP FG ... STL BLK TOV PF PTS Unnamed: 27
SOS
0 2020-21 Purdue Big Ten 23 12 24.2 3.9 ... 0.7 0.7 1.3 1.7 11.1 NaN 11.23
1 2021-22 Purdue Big Ten 36 34 31.4 5.6 ... 0.9 0.6 2.6 1.8 17.3 NaN 8.23
2 Career Purdue NaN 59 46 28.6 4.9 ... 0.8 0.6 2.1 1.7 14.9 NaN 9.73
[3 rows x 29 columns], Season School Conf G GS MP FG FGA ... DRB TRB AST STL BLK TOV PF PTS
0 2020-21 Purdue Big Ten 19 10 23.3 3.5 9.2 ... 2.7 3.6 2.1 0.8 0.7 1.4 1.6
10.3
1 2021-22 Purdue Big Ten 19 17 32.6 5.5 12.8 ... 3.3 4.2 2.9 0.9 0.5 2.5 1.9
17.5
2 Career Purdue NaN 38 27 27.9 4.5 11.0 ... 3.0 3.9 2.5 0.9 0.6 1.9 1.8
13.9
[3 rows x 27 columns], Season School Conf G GS MP FG FGA ... DRB TRB AST STL BLK TOV PF PTS
0 2020-21 Purdue Big Ten 23 12 557 89 223 ... 57 76 43 17 16 31 39 256
1 2021-22 Purdue Big Ten 36 34 1132 203 441 ... 152 176 110 33 20 94 63 624
2 Career Purdue NaN 59 46 1689 292 664 ... 209 252 153 50 36 125 102 880
[3 rows x 27 columns], Season School Conf G GS MP FG FGA ... DRB TRB AST STL BLK TOV PF PTS
0 2020-21 Purdue Big Ten 19 10 442 66 174 ... 51 68 39 15 13 26 31 195
1 2021-22 Purdue Big Ten 19 17 620 104 244 ... 62 79 55 18 10 47 36 333
2 Career Purdue NaN 38 27 1062 170 418 ... 113 147 94 33 23 73 67 528
[3 rows x 27 columns], Season School Conf G GS MP FG ... TRB AST STL BLK TOV PF PTS
0 2020-21 Purdue Big Ten 23 12 557 6.4 ... 5.5 3.1 1.2 1.1 2.2 2.8 18.4
1 2021-22 Purdue Big Ten 36 34 1132 7.2 ... 6.2 3.9 1.2 0.7 3.3 2.2 22.0
2 Career Purdue NaN 59 46 1689 6.9 ... 6.0 3.6 1.2 0.9 3.0 2.4 20.8
[3 rows x 25 columns]]