Home > Software design >  Dealing with missing value scraping with bs4
Dealing with missing value scraping with bs4

Time:01-31

this is my script to scrape odds from a particular web site (it should work also outside my country, i don't think there are restrictions yet):

from selenium import webdriver
from time import sleep
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

odds=[]
home=[]
away=[]

  
url = "https://www.efbet.it/scommesse/calcio/serie-c_1_31_-418"

driver = webdriver.Chrome(r"C:\chromedriver.exe")
driver.get(url)
sleep(5)
#driver.maximize_window()

#driver.find_element_by_id('onetrust-accept-btn-handler').click()
soup = BeautifulSoup(driver.page_source, "html.parser")

id = soup.find(class_="contenitore-table-grande")


for a in id.select("p[class*='tipoQuotazione_1']"):
    odds.append(a.text)
for a in id.select("p[class*='font-weight-bold m-0 text-right']"):
    home.append(a.text)

for a in id.select("p[class*='font-weight-bold m-0 text-left']"):
    away.append(a.text)

a=np.asarray(odds)
newa= a.reshape(42,10)

df = pd.DataFrame(newa)
df1 = pd.DataFrame(home)
df2 = pd.DataFrame(away)

dftot = pd.concat([df1, df2, df], axis=1)

Now it works fine (i'm aware it could be written in a better and cleaner way) but there's an issue: when new odds are published by the website, sometimes some kind of them are missing (i.e. under over or double chance 1X 12 X2). So i would need to put a zero or null value where they are missing, if not my array would not be corresponding in lenght and in odds to their respective matches. With ispection i see that when a value is missing there's only no text in the class tipoQuotazione:

<p >1.75</p>  with value

<p ></p>    when missing

Is there a way to perform this?

Thanks!

CodePudding user response:

... when new odds are published by the website, sometimes some kind of them are missing ...

As a better design suggestion, this is not only the problem you might end up with. What if the website changes a class name? That would break your code as well.

... sometimes some kind of them are missing (i.e. under over or double chance 1X 12 X2). So i would need to put a zero or null value where they are missing ...

for a in id.select("p[class*='tipoQuotazione_1']"):
    # if a.text == "" default to 0.0
    odds.append(a.text or 0.0)

Or you can do it with an if statement

if not a.text:
    odds.append(0.0)
  • Related