I am trying to scrape lineups from https://www.rotowire.com/hockey/nhl-lineups.php
I would like a resulting dataframe like the following
Team | Position | Player | Line |
---|---|---|---|
CAR | C | Sebastian Aho | Power Play #1 |
CAR | LW | Stefan Noesen | Power Play #1 |
....
This is what I have currently, but am unsure how to get the team and line to matchup with the players/positions as well as put into a dataframe
import requests, pandas as pd
from bs4 import BeautifulSoup
url = "https://www.rotowire.com/hockey/nhl-lineups.php"
soup = BeautifulSoup(requests.get(url).text, "html.parser")
lineups = soup.find_all('div', {'class':['lineups']})[0]
names = lineups.find_all('a', title=True)
for name in names:
name = name.get('title')
print(name)
positions = lineups.find_all('div', {'class':['lineup__pos']})
for pos in positions:
pos = pos.text
print(pos)
CodePudding user response:
Try:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://www.rotowire.com/hockey/nhl-lineups.php"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
all_data = []
for a in soup.select(".lineup__player a"):
name = a["title"]
pos = a.find_previous("div").text
line = a.find_previous(class_="lineup__title").text
lineup = a.find_previous(class_="lineup__list")["class"][-1]
team = a.find_previous(class_=f"lineup__team {lineup}").img["alt"]
all_data.append((team, pos, name, line))
df = pd.DataFrame(all_data, columns=["Team", "Pos", "Player", "Line"])
print(df.to_markdown(index=False))
Prints:
Team | Pos | Player | Line |
---|---|---|---|
CAR | C | Sebastian Aho | POWER PLAY #1 |
CAR | LW | Stefan Noesen | POWER PLAY #1 |
CAR | RW | Andrei Svechnikov | POWER PLAY #1 |
CAR | LD | Brent Burns | POWER PLAY #1 |
CAR | RD | Martin Necas | POWER PLAY #1 |