Home > Enterprise >  How to modify code to scrape data off of 2nd table on this webpage
How to modify code to scrape data off of 2nd table on this webpage

Time:06-07

I am trying to scrape data from a table on the following website: https://www.eliteprospects.com/league/nhl/stats/2021-2022

This is the code I found to successfully scrape off data from the first table for skater stats:

import requests
import pandas as pd
from bs4 import BeautifulSoup

dfs = []
for page in range(1,10):
    url = f"https://www.eliteprospects.com/league/nhl/stats/2021-2022?sort=tp&page={page}"
    print(f"Loading {url=}")
    soup = BeautifulSoup(requests.get(url).content, "html.parser")

    df = (
        pd.read_html(str(soup.select_one(".player-stats")))[0]
        .dropna(how="all")
        .reset_index(drop=True)
    )
    dfs.append(df)

df_final = pd.concat(dfs).reset_index(drop=True)
print(df_final)
df_final.to_csv("data.csv", index=False)

But I am having difficulty scraping off the goalie stats from the bottom table. Any idea how to modify the code to get the stats from the bottom table? I tried changing line 13 to "(".goalie-stats")" but it returned an error when I tried to run the code.

Thank you!!

CodePudding user response:

I found a way to get the data, but it isn't perfect. When I get it, it makes a lot of unnamed columns. Still, it gets the data, so I hope it's helpful

import requests
import pandas as pd
from bs4 import BeautifulSoup

dfs = []
for page in range(1,3):
    url = f"https://www.eliteprospects.com/league/nhl/stats/2021-2022?sort-goalie-stats=svp&page-goalie={page}#goalies"
    print(f"Loading {url=}")
    soup = BeautifulSoup(requests.get(url).content, "html.parser")
    df = (
        pd.read_html(str(soup.select_one(".goalie-stats")).replace('%', ''))[0]
        .dropna(how="all")
        .reset_index(drop=True)
    )
    dfs.append(df)

df_final = pd.concat(dfs).reset_index(drop=True)
print(df_final)
df_final.to_csv("data.csv", index=False)
  • Related