Although the web-scraper below works, it also includes listed hyperlinks unrelated to the webpage tables. What I would like to have help with is limiting the class criteria to only relevant tennis match hyperlinks within the class table "table-main only12 js-nrbanner-t".
import requests
from bs4 import BeautifulSoup
import pandas as pd
r = requests.get('https://www.betexplorer.com/results/tennis/?year=2022&month=11&day=02')
soup = BeautifulSoup(r.text, "html.parser")
matchlist = set('https://www.betexplorer.com' a.get('href') for a in soup.select('a[href^="/tennis"]:has(strong)'))
print(pd.DataFrame(matchlist))
Edit: Driftr95 has found the exact solution I was looking for, even when I didn't phrase the question correctly
CodePudding user response:
You can just add the table to the selector in select
tLinkSel = 'table.table-main.only12.js-nrbanner-t a[href^="/tennis"]:has(strong)'
matchlist = set('https://www.betexplorer.com' a.get('href') for a in soup.select(tLinkSel))
although, I have to mention that I did not see any difference in the results when searching in dev tools, but this will limit the links to only those in the table.
Additional EDIT:
You can target specific dates with the data-dt
attribute of the rows [tr
]; for example, for Nov 2, 2022, you can set
tLinkSel = 'tr[data-dt^="2,11,2022,"] a[href^="/tennis"]:has(strong)'