I'm trying to scrape this website: https://madduxsports.com/college-basketball-lines.php
I'm very new to python and scraping, I believe this website has a table generated with JavaScript.
I'm looking to scrape just the first 7 columns.
I've tried
from requests_html import HTMLSession
from bs4 import BeautifulSoup
session = HTMLSession()
resp = session.get("https://madduxsports.com/college-basketball-lines.php")
resp.html.render()
soup = BeautifulSoup(resp.html.html, "lxml")
script_tags = soup.find_all("script")
print(script_tags)
This will get everything with the <script>
tag which has the table data in it but I don't know how to get the first 7 columns.
Thanks for the help
CodePudding user response:
You could get it through the request directly (but you'll need to do a bit of manipulation of the html escape characters and what not. This gets you the same data as if we pulled it from the <script>
tag. I can show you how to get it that way as well if you'd like, but this is a better way in my opinion.
import requests
import pandas as pd
url = 'https://madduxsports.com/newodds/v2/scheduler-ajax.php'
payload = {
'timezone': 'America/New_York',
'is_first_request': '0',
'league_id': '4',
'sport_id': '2',
'period_id': '1'}
jsonData = requests.post(url, data=payload).json()
# Everything above is the to get the data
# jsonData is the json you see in the <script> tag
odds = jsonData['odds']
schedulers = jsonData['schedulers']
odds_df = pd.json_normalize(odds)
schedulers_df = pd.json_normalize(schedulers)
names_dict = {}
for each in odds:
names_dict[each['id']] = each['name']
cols = []
for col in schedulers_df:
for k, v in names_dict.items():
col = col.replace(str(k),v)
cols.append(col)
schedulers_df.columns = cols
cols = ['date','team_ids',
'team_names','score.away_score','score.home_score',
'score.description','opener.1.away','opener.1.home']
odds_cols = [x for x in schedulers_df.columns if ('1.away' in x or '1.home' in x) and ('class' not in x)]
df = schedulers_df[cols odds_cols]
Output:
print(df)
date team_ids ... odds.SIA.1.away odds.SIA.1.home
0 2021-12-03 00:00:00 306123<br>306124 ... 143½ -1½
1 2021-12-03 00:00:00 306127<br>306128 ... 142½ 11
2 2021-12-03 00:00:00 306129<br>306130 ... 126½u12 -5½
3 2021-12-03 00:00:00 306131<br>306132 ... 17 146½
4 2021-12-03 01:00:00 306133<br>306134 ... -2½ 135½
.. ... ... ... ... ...
107 2021-12-04 07:50:00 396155<br>396156 ...
108 2021-12-04 07:50:00 396157<br>396158 ...
109 2021-12-04 07:50:00 396159<br>396160 ...
110 2021-12-04 07:50:00 9875<br>9876 ...
111 2021-12-04 07:50:00 9877<br>9878 ...
[112 rows x 22 columns]