Home > Enterprise >  Not able to scrape entire table using pd.read_html
Not able to scrape entire table using pd.read_html

Time:09-18

I tried using pd.read_html to scrape a table, but the last 3 columns are returning "nan". Here is the code I used:

import pandas as pd

url = 'https://www.actionnetwork.com/mlb/public-betting'

todays_games = pd.read_html(url)[0]

There are 7 columns in total, and it grabs all of the headers, but not the data in the last 3 columns. I also tried parsing this using BeautifulSoup, but got the same result.

print(todays_games)

                                                Scheduled      Open  ... Diff Bets
    0                5:05 PM 951MarlinsMIA952NationalsWSH  -118 100  ...  NaN  NaN
    1                   5:10 PM 979BrewersMIL980TigersDET  -227 188  ...  NaN  NaN
    2                    7:07 PM 965RaysTB966Blue JaysTOR   150-175  ...  NaN  NaN
    3                 8:10 PM 967Red SoxBOS968MarinersSEA  -125 105  ...  NaN  NaN
    4                    10:35 PM 953RedsCIN954PiratesPIT  -154 135  ...  NaN  NaN
    5                   11:05 PM 955CubsCHC956PhilliesPHI   170-200  ...  NaN  NaN
    6                 11:05 PM 969YankeesNYY970OriolesBAL  -227 188  ...  NaN  NaN
    7                  11:10 PM 957CardinalsSTL958MetsNYM   135-154  ...  NaN  NaN
    8                  11:20 PM 959RockiesCOL960BravesATL   170-200  ...  NaN  NaN
    9                   11:40 PM 971IndiansCLE972TwinsMIN   100-118  ...  NaN  NaN
    10       Thu 9/16, 12:05 AM 973AstrosHOU974RangersTEX  -213 175  ...  NaN  NaN
    11     Thu 9/16, 12:10 AM 975AngelsLAA976White SoxCWS   160-189  ...  NaN  NaN
    12      Thu 9/16, 12:10 AM 977AthleticsOAK978RoyalsKC  -149 125  ...  NaN  NaN
    13           Thu 9/16, 1:45 AM 961PadresSD962GiantsSF   103-120  ...  NaN  NaN
    14  Thu 9/16, 2:10 AM 963DiamondbacksARI964DodgersLAD  -185 155  ...  NaN  NaN

I'm assuming the problem has something to do with the HTML code. Can anyone help me solve this?

CodePudding user response:

Send HTTP GET to https://api.actionnetwork.com/web/v1/scoreboard/mlb?bookIds=15,30,68,75,69,76,71,79,247,123,263&date=20210915

and get the data you are looking for.

import requests

r = requests.get(
    'https://api.actionnetwork.com/web/v1/scoreboard/mlb?bookIds=15,30,68,75,69,76,71,79,247,123,263&date=20210915')
print(r.json())
  • Related