this is my script to scrape odds from a particular web site (it should work also outside my country, i don't think there are restrictions yet):
from selenium import webdriver
from time import sleep
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
odds=[]
home=[]
away=[]
url = "https://www.efbet.it/scommesse/calcio/serie-c_1_31_-418"
driver = webdriver.Chrome(r"C:\chromedriver.exe")
driver.get(url)
sleep(5)
#driver.maximize_window()
#driver.find_element_by_id('onetrust-accept-btn-handler').click()
soup = BeautifulSoup(driver.page_source, "html.parser")
id = soup.find(class_="contenitore-table-grande")
for a in id.select("p[class*='tipoQuotazione_1']"):
odds.append(a.text)
for a in id.select("p[class*='font-weight-bold m-0 text-right']"):
home.append(a.text)
for a in id.select("p[class*='font-weight-bold m-0 text-left']"):
away.append(a.text)
a=np.asarray(odds)
newa= a.reshape(42,10)
df = pd.DataFrame(newa)
df1 = pd.DataFrame(home)
df2 = pd.DataFrame(away)
dftot = pd.concat([df1, df2, df], axis=1)
Now it works fine (i'm aware it could be written in a better and cleaner way) but there's an issue: when new odds are published by the website, sometimes some kind of them are missing (i.e. under over or double chance 1X 12 X2). So i would need to put a zero or null value where they are missing, if not my array would not be corresponding in lenght and in odds to their respective matches. With ispection i see that when a value is missing there's only no text in the class tipoQuotazione:
<p >1.75</p> with value
<p ></p> when missing
Is there a way to perform this?
Thanks!
CodePudding user response:
... when new odds are published by the website, sometimes some kind of them are missing ...
As a better design suggestion, this is not only the problem you might end up with. What if the website changes a class name? That would break your code as well.
... sometimes some kind of them are missing (i.e. under over or double chance 1X 12 X2). So i would need to put a zero or null value where they are missing ...
for a in id.select("p[class*='tipoQuotazione_1']"):
# if a.text == "" default to 0.0
odds.append(a.text or 0.0)
Or you can do it with an if
statement
if not a.text:
odds.append(0.0)