I'm currently looking to scrape https://www.bestfightodds.com/ for an MMA machine learning project. I'm specifically looking for the DraftKings opening odds for each fighter which is found by clicking on the odds for a given fighter under the DraftKings column. You are then presented with a popup table that shows how the betting odds have changed over time. The table presents you with the openings odds and the latest (current) odds.
I have no issue scraping the fighter names, but I can't figure out how to scrape the opening odds in the popup table. The HTML code from the popup table only appears in the inspect function when you click on it which is why I get a 'None' when I try to find it in the site's HTML.
This is my code so far:
# Importing packages
from bs4 import BeautifulSoup
import requests
# Specifying website URL
html_text = requests.get('https://www.bestfightodds.com/events/ufc-273-2411').text
soup = BeautifulSoup(html_text, 'lxml')
# Finding values
fighter_names = soup.find_all('span', class_ = 't-b-fcc')
opening_odds = soup.find_all('span, style_ = 'margin-left: 4px; margin-right: 4px;')
for fighter_names in soup.find_all('span', class_ = 't-b-fcc'):
print (fighter_names.get_text())
Here is a photo of where and how to locate the opening odds. The blue box is where you click to find the red one, which is the one I need to scrape for all fighters.
CodePudding user response:
The pop-ups are triggered by JavaScript, so your scrapper needs to be able to inject JavaScript into the website. I know apify.com uses what is called Headless chrome/chromium automation. You can check out this python library Headless Chrome/Chromium automation library (unofficial port of puppeteer) on GitHub.
CodePudding user response:
Fun little project. The data the server sends are encoded by custom JavaScript function, so you need to use selenium
or rewrite the decoding function to Python.
I used js2py
to execute the javascript function directly in python (and not use selenium
- it rewrites the javascript function to python automatically), but you can rewrite it to Python if you wish:
import json
import js2py
import requests
from bs4 import BeautifulSoup
js_decode_func = r"""function $(e) {
var t,
a,
r,
s,
o,
i,
l = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 /=',
n = '',
d = 0;
for (e = e.replace(/[^A-Za-z0-9\ \/\=]/g, ''); d < e.length;) t = l.indexOf(e.charAt(d )) << 2 | (s = l.indexOf(e.charAt(d ))) >> 4,
a = (15 & s) << 4 | (o = l.indexOf(e.charAt(d ))) >> 2,
r = (3 & o) << 6 | (i = l.indexOf(e.charAt(d ))),
n = String.fromCharCode(t),
64 != o && (n = String.fromCharCode(a)),
64 != i && (n = String.fromCharCode(r));
for (var c = '', h = 0, p = c1 = c2 = 0; h < n.length;)(p = n.charCodeAt(h)) < 128 ? (c = String.fromCharCode(p), h ) : 191 < p && p < 224 ? (c2 = n.charCodeAt(h 1), c = String.fromCharCode((31 & p) << 6 | 63 & c2), h = 2) : (c2 = n.charCodeAt(h 1), c3 = n.charCodeAt(h 2), c = String.fromCharCode((15 & p) << 12 | (63 & c2) << 6 | 63 & c3), h = 3);
var u,
f,
m,
g = '!"#$%&\'()* ,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~',
y = new String,
$ = g.length;
for (u = 0; u < c.length; u ) m = c.charAt(u),
0 <= (f = g.indexOf(m)) && (m = g.charAt((f $ / 2) % $)),
y = m;
return y
}"""
js_get_value_func = r"""function $(e) {
return 2 <= e ? ' ' Math.round(100 * (e - 1)) : e < 2 ? '' Math.round( - 100 / (e - 1)) : 'error'
}"""
decode = js2py.eval_js(js_decode_func)
get_value = js2py.eval_js(js_get_value_func)
url = "https://www.bestfightodds.com/"
api_url = "https://www.bestfightodds.com/api/ggd"
params = {"b": "22", "m": "25728", "p": "1"}
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for td in soup.select("td[data-li]"):
vals = json.loads(td["data-li"])
if len(vals) != 3 or vals[0] != 22: # 22 - DraftKings
continue
params["b"], params["p"], params["m"] = vals
name = td.find_previous(class_="t-b-fcc").text
encoded_text = requests.get(api_url, params=params).text
data = json.loads(decode(encoded_text))
first_value = get_value(data[0]["data"][0]["y"])
print(name, first_value)
Prints:
Alexander Volkanovski -450
Chan Sung Jung 340
Aljamain Sterling 320
Petr Yan -425
Mackenzie Dern 120
Tecia Torres -140
Mark Madsen 130
Vinc Pichel -150
Darian Weeks 190
Ian Garry -235
Mickey Gall 145
Mike Malott. -165
Aspen Ladd 155
Raquel Pennington -180
Anthony Hernandez -180
Josh Fremd 155
Aleksei Oleinik -105
Jared Vanderaa -115
Kay Hansen -150
Piera Rodriguez 130
Daniel Santos 175
Julio Arce -210
Belal Muhammad 150
Vicente Luque -170
Devin Clark -160
William Knight 140
Jordan Leavitt 110
Trey Ogden. -130
Elizeu Zaleski Dos Santos -195
Mounir Lazzez 165
Pat Sabatini -305
T.J. Laramie 240
Mayra Bueno Silva -365
Yanan Wu 280
Lina Akhtar Lansberg 245
Pannie Kianzad -310
Chris Barnett 165
Martin Buday -195
Andre Fialho 150
Miguel Baeza -170
Brandon Jenkins 320
Drakkar Klose -425
Jesse Ronson 110
Rafa Garcia -130
Caio Borralho 115
Gadzhi Omargadzhiev -135
Istela Nunes -190
Sam Hughes 160
Heili Alateng -180
Kevin Croom 155
Carla Esparza 150
Rose Namajunas -170
Glover Teixeira 155
Jiri Prochazka -180
Dustin Poirier -435
Nate Diaz 330
Charles Oliveira -160
Justin Gaethje 140
Gilbert Burns 280
Khamzat Chimaev -365
Arman Tsarukyan -335
Joel Alvarez 260
Calvin Cattar 170
Giga Chikadze -200