Home > Mobile >  Beautiful Soup Scraping
Beautiful Soup Scraping

Time:01-02

I am trying to scrape lineups from https://www.rotowire.com/hockey/nhl-lineups.php

I would like a resulting dataframe like the following

Team Position Player Line
CAR C Sebastian Aho Power Play #1
CAR LW Stefan Noesen Power Play #1

....

This is what I have currently, but am unsure how to get the team and line to matchup with the players/positions as well as put into a dataframe

import requests, pandas as pd
from bs4 import BeautifulSoup

url = "https://www.rotowire.com/hockey/nhl-lineups.php"
soup = BeautifulSoup(requests.get(url).text, "html.parser")

lineups = soup.find_all('div', {'class':['lineups']})[0]
names = lineups.find_all('a', title=True)
for name in names:
    name = name.get('title')
    print(name)
positions = lineups.find_all('div',  {'class':['lineup__pos']})
for pos in positions:
    pos = pos.text
    print(pos)

CodePudding user response:

Try:

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "https://www.rotowire.com/hockey/nhl-lineups.php"

soup = BeautifulSoup(requests.get(url).content, "html.parser")

all_data = []
for a in soup.select(".lineup__player a"):
    name = a["title"]
    pos = a.find_previous("div").text
    line = a.find_previous(class_="lineup__title").text

    lineup = a.find_previous(class_="lineup__list")["class"][-1]
    team = a.find_previous(class_=f"lineup__team {lineup}").img["alt"]

    all_data.append((team, pos, name, line))

df = pd.DataFrame(all_data, columns=["Team", "Pos", "Player", "Line"])
print(df.to_markdown(index=False))

Prints:

Team Pos Player Line
CAR C Sebastian Aho POWER PLAY #1
CAR LW Stefan Noesen POWER PLAY #1
CAR RW Andrei Svechnikov POWER PLAY #1
CAR LD Brent Burns POWER PLAY #1
CAR RD Martin Necas POWER PLAY #1
  • Related