Home > Software design >  python scrap data from .aspx web page
python scrap data from .aspx web page

Time:08-20

I want to scarp the table from this webpage to pandas table: https://www.perfectgame.org/College/CollegePlayerReports.aspx

I've used both requests and request-HTML but both don't seem to be effective,

from requests_html import HTMLSession
from requests import *
from bs4 import BeautifulSoup
import pandas as pd

def get_stats( name, year ) :

    with HTMLSession() as s :
        source = 'https://www.perfectgame.org/College/CollegePlayerReports.aspx'
        response = s.get( source )
        table = response.html.find('table.Grid', first=True)
        df = pd.read_html( table.html, header = 0 ) [ 0 ]
        print( df )

any solutions?

CodePudding user response:

To get data from table into pandas dataframe you can use next example:

import requests
import pandas as pd
from bs4 import BeautifulSoup


url = "https://www.perfectgame.org/College/CollegePlayerReports.aspx"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = []
for row in soup.select("tbody tr.rgRow, tbody tr.rgAltRow"):
    data.append(row.get_text(strip=True, separator="|").split("|"))

df = pd.DataFrame(
    data,
    columns=["Reports", "Draft Eligible", "Class", "College", "Report Date"],
)
print(df.to_markdown(index=False))

Prints:

Reports Draft Eligible Class College Report Date
Drew Williamson 2022 Senior Alabama 6/1/2022
Caden Rose 2023 Sophomore Alabama 6/1/2022
Wyatt Langford 2023 Sophomore Florida 6/1/2022
Nick Ficarrotta 2022 Freshman Florida 6/1/2022
Fisher Jameson 2024 Freshman Florida 6/1/2022

...

  • Related