Home > Enterprise >  How to Scrape NBA starting lineups and create a Pandas DataFrame?
How to Scrape NBA starting lineups and create a Pandas DataFrame?

Time:10-16

I am having trouble parsing the code for the NBA starting lineups and would love some help if possible.

Here is my code so far:

import requests
from bs4 import BeautifulSoup

url = "https://www.rotowire.com/basketball/nba-lineups.php"
soup = BeautifulSoup(requests.get(url).text, "html.parser")

lineups = soup.find_all(class_='lineup__player')
print(lineups)

I am looking for the following data:

  1. Player

  2. Team

  3. Position

I was hoping to scrape the data and then create a Pandas Dataframe from the output.

Here is an example of my desired output:

    Player        Team   Position
Dennis Schroder    BOS      PG
Robert Langford    BOS      SG
Jayson Tatum       BOS      SF
Jabari Parker      BOS      PF
Grant Williams     BOS      C

    Player        Team    Postion
Kyle Lowry         MIA      PG
Duncan Robinson    MIA      SG
Jimmy Butler       MIA      SF
P.J.Tucker         MIA      PF
Bam Adebayo        MIA      C

...                ...      ...

I was able to find the Player data but was unable to successfully parse it. I can see the Player data located inside 'Title'.

Any tips on how to complete this project will be greatly appreciated. Thank you in advance for any help that you may offer.

I am just looking for the 5 starting players... no need to add the bench players. And not sure if there is some way to add a space in between each team like my output above.

Here is and example of the current output that I would like to parse:

 
[<li class="lineup__player is-pct-play-100" title="Very Likely To Play">
<div class="lineup__pos">PG</div>
<a href="/basketball/player.php?id=3444" title="Dennis Schroder">D. Schroder</a>
</li>, <li class="lineup__player is-pct-play-100" title="Very Likely To Play">
<div class="lineup__pos">SG</div>
<a href="/basketball/player.php?id=4762" title="Romeo Langford">R.

CodePudding user response:

You're on the right track. Here's one way to do it.

import requests, pandas
from bs4 import BeautifulSoup

url = "https://www.rotowire.com/basketball/nba-lineups.php"
soup = BeautifulSoup(requests.get(url).text, "html.parser")

lineups = soup.find_all(class_='is-pct-play-100')
positions = [x.find('div').text for x in lineups]
names = [x.find('a')['title'] for x in lineups]
teams = sum([[x.text] * 5 for x in soup.find_all(class_='lineup__abbr')], [])

df = pandas.DataFrame(zip(names, teams, positions))
print(df)
  • Related