I am making a betting software that scrapes the fixtures from a website, puts them into an Excel file and then compares it with another Excel file that I have already manually created to find the winner (the source) The source file would look like this and The scraped output would look like this
What I want to do is for the software to find matching fixtures between the two files, then figure out the result using the "Result" column in the source file.
Problem is I don't know how to do it, and I have no idea how to look for it, I'm such a beginner and I picked this project for school, can someone at least give me a name to what I'm trying to do (if it has a name), an idea, a path to follow, etc...
Thanks a lot to everyone !
Here is my code for now (Only scraping and making an Excel file)
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
#Formation de l'url à scraper
url= "https://www.bbc.com/sport/football/scores-fixtures/"
date = input(str("Input the date, hit Enter for today"))
src = url date
#Scraping de la page web, balise abbr avec attribut title
html = requests.get(src).text
soup = bs(html, 'lxml')
select = soup.find_all('abbr')
fixtures = []
for abbr in select:
if abbr.has_attr('title'):
output = str(abbr['title'])
fixtures.append(output)
#Création tableau excel, liste[position :: itération]
table = pd.DataFrame()
table['Home'] = pd.Series(fixtures[::2])
table['Away'] = pd.Series(fixtures[1::2])
name = 'fixtures' '-' date '.xlsx'
table.to_excel(name, index=False)
CodePudding user response:
The table of this url is not so easy to manipulate, I suggest using selenium:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
options = webdriver.ChromeOptions()
options.add_argument('--no-default-browser-check')
options.add_argument('--log-level=3')
options.add_argument('--headless')
service = Service('chromedriver.exe')
driver = webdriver.Chrome(options=options, service=service)
date = input(str("Input the date, hit Enter for today: "))
driver.get("https://www.bbc.com/sport/football/scores-fixtures/" date)
games = driver.find_elements(By.XPATH, "//article[@data-event-id]")
ListHome = []
ListAway = []
ListTime = []
for game in games:
ListHome.append(game.find_element(By.XPATH,".//span[contains(@class,'team--time-home')]").text)
ListAway.append(game.find_element(By.XPATH,".//span[contains(@class,'team--time-away')]").text)
df = pd.DataFrame(list(zip(ListHome, ListAway)), columns =['Home', 'Away'])
df.to_excel('name.xlsx', index=False)
Use pd.merge to compare with your other spreadsheet.
CodePudding user response:
Hi Wilian thanks for your answer,
I just tried the code you gave me, but it shows this error below, since I've never worked with selenium I can't understand the problem.
DevTools listening on ws://127.0.0.1:24083/devtools/browser/c0807040-1041-4a0e-a9ed-f1add52c5484
Input the date, hit Enter for today:
Traceback (most recent call last):
File "c:\Users\dgayg\Desktop\pyBet\testmatch.py", line 19, in <module>
ListHome.append(game.find_element(By.XPATH,".//span[contains(@class,'team--time-home')]").text)
File "C:\Users\dgayg\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\remote\webelement.py", line 718, in find_element
return self._execute(Command.FIND_CHILD_ELEMENT,
File "C:\Users\dgayg\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\remote\webelement.py", line 693, in _execute
return self._parent.execute(command, params)
File "C:\Users\dgayg\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 418, in execute
self.error_handler.check_response(response)
File "C:\Users\dgayg\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 243, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//span[contains(@class,'team--time-home')]"}
(Session info: headless chrome=95.0.4638.69)
Stacktrace:
Backtrace:
Ordinal0 [0x00880C43 2493507]
Ordinal0 [0x0081A4B1 2073777]
Ordinal0 [0x00722608 1058312]
Ordinal0 [0x0074CAA4 1231524]
Ordinal0 [0x00743621 1193505]
Ordinal0 [0x0076597A 1333626]
Ordinal0 [0x007435A6 1193382]
Ordinal0 [0x00765A2A 1333802]
Ordinal0 [0x00775038 1396792]
Ordinal0 [0x0076580B 1333259]
Ordinal0 [0x00742314 1188628]
Ordinal0 [0x0074316F 1192303]
GetHandleVerifier [0x00A07BF6 1548950]
GetHandleVerifier [0x00AB461C 2256060]
GetHandleVerifier [0x0090C13B 518107]
GetHandleVerifier [0x0090B1E0 514176]
Ordinal0 [0x0081F53D 2094397]
Ordinal0 [0x00823418 2110488]
Ordinal0 [0x00823552 2110802]
Ordinal0 [0x0082CE81 2150017]
BaseThreadInitThunk [0x7518FA29 25]
RtlGetAppContainerNamedObjectPath [0x76FA7A9E 286]
RtlGetAppContainerNamedObjectPath [0x76FA7A6E 238]
Thanks a lot for your time !