I want to scarp the table from this webpage to pandas table: https://www.perfectgame.org/College/CollegePlayerReports.aspx
I've used both requests and request-HTML but both don't seem to be effective,
from requests_html import HTMLSession
from requests import *
from bs4 import BeautifulSoup
import pandas as pd
def get_stats( name, year ) :
with HTMLSession() as s :
source = 'https://www.perfectgame.org/College/CollegePlayerReports.aspx'
response = s.get( source )
table = response.html.find('table.Grid', first=True)
df = pd.read_html( table.html, header = 0 ) [ 0 ]
print( df )
any solutions?
CodePudding user response:
To get data from table into pandas dataframe you can use next example:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.perfectgame.org/College/CollegePlayerReports.aspx"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = []
for row in soup.select("tbody tr.rgRow, tbody tr.rgAltRow"):
data.append(row.get_text(strip=True, separator="|").split("|"))
df = pd.DataFrame(
data,
columns=["Reports", "Draft Eligible", "Class", "College", "Report Date"],
)
print(df.to_markdown(index=False))
Prints:
Reports | Draft Eligible | Class | College | Report Date |
---|---|---|---|---|
Drew Williamson | 2022 | Senior | Alabama | 6/1/2022 |
Caden Rose | 2023 | Sophomore | Alabama | 6/1/2022 |
Wyatt Langford | 2023 | Sophomore | Florida | 6/1/2022 |
Nick Ficarrotta | 2022 | Freshman | Florida | 6/1/2022 |
Fisher Jameson | 2024 | Freshman | Florida | 6/1/2022 |
...