I am having trouble parsing the code for the NBA starting lineups and would love some help if possible.
Here is my code so far:
import requests
from bs4 import BeautifulSoup
url = "https://www.rotowire.com/basketball/nba-lineups.php"
soup = BeautifulSoup(requests.get(url).text, "html.parser")
lineups = soup.find_all(class_='lineup__player')
print(lineups)
I am looking for the following data:
Player
Team
Position
I was hoping to scrape the data and then create a Pandas Dataframe from the output.
Here is an example of my desired output:
Player Team Position
Dennis Schroder BOS PG
Robert Langford BOS SG
Jayson Tatum BOS SF
Jabari Parker BOS PF
Grant Williams BOS C
Player Team Postion
Kyle Lowry MIA PG
Duncan Robinson MIA SG
Jimmy Butler MIA SF
P.J.Tucker MIA PF
Bam Adebayo MIA C
... ... ...
I was able to find the Player data but was unable to successfully parse it. I can see the Player data located inside 'Title'.
Any tips on how to complete this project will be greatly appreciated. Thank you in advance for any help that you may offer.
I am just looking for the 5 starting players... no need to add the bench players. And not sure if there is some way to add a space in between each team like my output above.
Here is and example of the current output that I would like to parse:
[<li class="lineup__player is-pct-play-100" title="Very Likely To Play">
<div class="lineup__pos">PG</div>
<a href="/basketball/player.php?id=3444" title="Dennis Schroder">D. Schroder</a>
</li>, <li class="lineup__player is-pct-play-100" title="Very Likely To Play">
<div class="lineup__pos">SG</div>
<a href="/basketball/player.php?id=4762" title="Romeo Langford">R.
CodePudding user response:
You're on the right track. Here's one way to do it.
import requests, pandas
from bs4 import BeautifulSoup
url = "https://www.rotowire.com/basketball/nba-lineups.php"
soup = BeautifulSoup(requests.get(url).text, "html.parser")
lineups = soup.find_all(class_='is-pct-play-100')
positions = [x.find('div').text for x in lineups]
names = [x.find('a')['title'] for x in lineups]
teams = sum([[x.text] * 5 for x in soup.find_all(class_='lineup__abbr')], [])
df = pandas.DataFrame(zip(names, teams, positions))
print(df)