I would like to scrape the 2nd table in the page seen below from the link - https://fbref.com/en/comps/82/stats/Indian-Super-League-Stats#all_stats_standard on google collab.
but pd.read_html("https://fbref.com/en/comps/82/stats/Indian-Super-League-Stats#all_stats_standard") only gives me the first table.
Please help me understand where I am going wrong.
CodePudding user response:
This is one way to read that data:
import pandas as pd
import requests
url= 'https://fbref.com/en/comps/82/stats/Indian-Super-League-Stats#all_stats_standard'
response = requests.get(url).text.replace('<!--', '').replace('-->', '')
df = pd.read_html(response, header=1)[2]
print(df)
Result in terminal:
Rk Player Nation Pos Squad Age Born MP Starts Min 90s Gls Ast G-PK PK PKatt CrdY CrdR Gls.1 Ast.1 G A G-PK.1 G A-PK Matches
0 1 Sahal Abdul Samad in IND MF Kerala Blasters 24 1997 20 19 1443 16.0 5 1 5 0 0 0 0 0.31 0.06 0.37 0.31 0.37 Matches
1 2 Ayush Adhikari in IND MF Kerala Blasters 21 2000 14 6 540 6.0 0 0 0 0 0 3 1 0.00 0.00 0.00 0.00 0.00 Matches
2 3 Gani Ahammed Nigam in IND FW NorthEast Utd 23 1998 6 0 66 0.7 0 0 0 0 0 0 0 0.00 0.00 0.00 0.00 0.00 Matches
3 4 Airam es ESP FW Goa 33 1987 13 8 751 8.3 6 1 5 1 2 0 0 0.72 0.12 0.84 0.60 0.72 Matches
4 5 Alex br BRA MF Jamshedpur 32 1988 20 12 1118 12.4 1 4 1 0 0 2 0 0.08 0.32 0.40 0.08 0.40 Matches
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
302 292 João Victor br BRA MF Hyderabad FC 32 1988 18 18 1590 17.7 5 1 3 2 2 3 0 0.28 0.06 0.34 0.17 0.23 Matches
303 293 David Williams au AUS FW Mohun Bagan 33 1988 15 6 602 6.7 4 1 4 0 1 2 0 0.60 0.15 0.75 0.60 0.75 Matches
304 294 Banana Yaya cm CMR DF Bengaluru 30 1991 5 2 229 2.5 0 1 0 0 0 1 0 0.00 0.39 0.39 0.00 0.39 Matches
305 295 Joe Zoherliana in IND DF NorthEast Utd 22 1999 9 6 677 7.5 0 1 0 0 0 0 0 0.00 0.13 0.13 0.00 0.13 Matches
306 296 Mark Zothanpuia in IND MF Hyderabad FC 19 2002 3 0 63 0.7 0 0 0 0 0 0 0 0.00 0.00 0.00 0.00 0.00 Matches
307 rows × 24 columns
Relevant pandas documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html