Home > database >  Match Data in Excel using python
Match Data in Excel using python

Time:11-11

I am making a betting software that scrapes the fixtures from a website, puts them into an Excel file and then compares it with another Excel file that I have already manually created to find the winner (the source) The source file would look like this and The scraped output would look like this

What I want to do is for the software to find matching fixtures between the two files, then figure out the result using the "Result" column in the source file.

Problem is I don't know how to do it, and I have no idea how to look for it, I'm such a beginner and I picked this project for school, can someone at least give me a name to what I'm trying to do (if it has a name), an idea, a path to follow, etc...

Thanks a lot to everyone !

Here is my code for now (Only scraping and making an Excel file)

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

#Formation de l'url à scraper
url= "https://www.bbc.com/sport/football/scores-fixtures/"
date = input(str("Input the date, hit Enter for today"))
src = url   date

#Scraping de la page web, balise abbr avec attribut title
html = requests.get(src).text
soup = bs(html, 'lxml')

select = soup.find_all('abbr')

fixtures = []
for abbr in select:
    if abbr.has_attr('title'):
        output = str(abbr['title'])
        fixtures.append(output)

#Création tableau excel, liste[position :: itération]

table = pd.DataFrame()
table['Home'] = pd.Series(fixtures[::2])
table['Away'] = pd.Series(fixtures[1::2])
name = 'fixtures'   '-'   date   '.xlsx'
table.to_excel(name, index=False)

CodePudding user response:

The table of this url is not so easy to manipulate, I suggest using selenium:

import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service

options = webdriver.ChromeOptions() 
options.add_argument('--no-default-browser-check')
options.add_argument('--log-level=3')
options.add_argument('--headless')
service = Service('chromedriver.exe')
driver = webdriver.Chrome(options=options, service=service)
date = input(str("Input the date, hit Enter for today: "))
driver.get("https://www.bbc.com/sport/football/scores-fixtures/"   date)
games = driver.find_elements(By.XPATH, "//article[@data-event-id]")
ListHome = []
ListAway = []
ListTime = []
for game in games:
        ListHome.append(game.find_element(By.XPATH,".//span[contains(@class,'team--time-home')]").text)
        ListAway.append(game.find_element(By.XPATH,".//span[contains(@class,'team--time-away')]").text)
df = pd.DataFrame(list(zip(ListHome, ListAway)), columns =['Home', 'Away'])
df.to_excel('name.xlsx', index=False)

Use pd.merge to compare with your other spreadsheet.

CodePudding user response:

Hi Wilian thanks for your answer,

I just tried the code you gave me, but it shows this error below, since I've never worked with selenium I can't understand the problem.

DevTools listening on ws://127.0.0.1:24083/devtools/browser/c0807040-1041-4a0e-a9ed-f1add52c5484
Input the date, hit Enter for today: 
Traceback (most recent call last):
  File "c:\Users\dgayg\Desktop\pyBet\testmatch.py", line 19, in <module>
    ListHome.append(game.find_element(By.XPATH,".//span[contains(@class,'team--time-home')]").text)
  File "C:\Users\dgayg\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\remote\webelement.py", line 718, in find_element
    return self._execute(Command.FIND_CHILD_ELEMENT,
  File "C:\Users\dgayg\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\remote\webelement.py", line 693, in _execute
    return self._parent.execute(command, params)
  File "C:\Users\dgayg\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 418, in execute
    self.error_handler.check_response(response)
  File "C:\Users\dgayg\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 243, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//span[contains(@class,'team--time-home')]"}
  (Session info: headless chrome=95.0.4638.69)
Stacktrace:
Backtrace:
        Ordinal0 [0x00880C43 2493507]
        Ordinal0 [0x0081A4B1 2073777]
        Ordinal0 [0x00722608 1058312]
        Ordinal0 [0x0074CAA4 1231524]
        Ordinal0 [0x00743621 1193505]
        Ordinal0 [0x0076597A 1333626]
        Ordinal0 [0x007435A6 1193382]
        Ordinal0 [0x00765A2A 1333802]
        Ordinal0 [0x00775038 1396792]
        Ordinal0 [0x0076580B 1333259]
        Ordinal0 [0x00742314 1188628]
        Ordinal0 [0x0074316F 1192303]
        GetHandleVerifier [0x00A07BF6 1548950]
        GetHandleVerifier [0x00AB461C 2256060]
        GetHandleVerifier [0x0090C13B 518107]
        GetHandleVerifier [0x0090B1E0 514176]
        Ordinal0 [0x0081F53D 2094397]
        Ordinal0 [0x00823418 2110488]
        Ordinal0 [0x00823552 2110802]
        Ordinal0 [0x0082CE81 2150017]
        BaseThreadInitThunk [0x7518FA29 25]
        RtlGetAppContainerNamedObjectPath [0x76FA7A9E 286]
        RtlGetAppContainerNamedObjectPath [0x76FA7A6E 238]

Thanks a lot for your time !

  • Related