I tried using pd.read_html to scrape a table, but the last 3 columns are returning "nan". Here is the code I used:
import pandas as pd
url = 'https://www.actionnetwork.com/mlb/public-betting'
todays_games = pd.read_html(url)[0]
There are 7 columns in total, and it grabs all of the headers, but not the data in the last 3 columns. I also tried parsing this using BeautifulSoup, but got the same result.
print(todays_games)
Scheduled Open ... Diff Bets
0 5:05 PM 951MarlinsMIA952NationalsWSH -118 100 ... NaN NaN
1 5:10 PM 979BrewersMIL980TigersDET -227 188 ... NaN NaN
2 7:07 PM 965RaysTB966Blue JaysTOR 150-175 ... NaN NaN
3 8:10 PM 967Red SoxBOS968MarinersSEA -125 105 ... NaN NaN
4 10:35 PM 953RedsCIN954PiratesPIT -154 135 ... NaN NaN
5 11:05 PM 955CubsCHC956PhilliesPHI 170-200 ... NaN NaN
6 11:05 PM 969YankeesNYY970OriolesBAL -227 188 ... NaN NaN
7 11:10 PM 957CardinalsSTL958MetsNYM 135-154 ... NaN NaN
8 11:20 PM 959RockiesCOL960BravesATL 170-200 ... NaN NaN
9 11:40 PM 971IndiansCLE972TwinsMIN 100-118 ... NaN NaN
10 Thu 9/16, 12:05 AM 973AstrosHOU974RangersTEX -213 175 ... NaN NaN
11 Thu 9/16, 12:10 AM 975AngelsLAA976White SoxCWS 160-189 ... NaN NaN
12 Thu 9/16, 12:10 AM 977AthleticsOAK978RoyalsKC -149 125 ... NaN NaN
13 Thu 9/16, 1:45 AM 961PadresSD962GiantsSF 103-120 ... NaN NaN
14 Thu 9/16, 2:10 AM 963DiamondbacksARI964DodgersLAD -185 155 ... NaN NaN
I'm assuming the problem has something to do with the HTML code. Can anyone help me solve this?
CodePudding user response:
Send HTTP GET to https://api.actionnetwork.com/web/v1/scoreboard/mlb?bookIds=15,30,68,75,69,76,71,79,247,123,263&date=20210915
and get the data you are looking for.
import requests
r = requests.get(
'https://api.actionnetwork.com/web/v1/scoreboard/mlb?bookIds=15,30,68,75,69,76,71,79,247,123,263&date=20210915')
print(r.json())