Home > Software design >  How do I apply table class criteria in a web-scraper through python?
How do I apply table class criteria in a web-scraper through python?

Time:11-10

Although the web-scraper below works, it also includes listed hyperlinks unrelated to the webpage tables. What I would like to have help with is limiting the class criteria to only relevant tennis match hyperlinks within the class table "table-main only12 js-nrbanner-t".

import requests
from bs4 import BeautifulSoup
import pandas as pd

r = requests.get('https://www.betexplorer.com/results/tennis/?year=2022&month=11&day=02')
soup = BeautifulSoup(r.text, "html.parser")

matchlist = set('https://www.betexplorer.com' a.get('href') for a in soup.select('a[href^="/tennis"]:has(strong)'))

print(pd.DataFrame(matchlist))

Edit: Driftr95 has found the exact solution I was looking for, even when I didn't phrase the question correctly

CodePudding user response:

You can just add the table to the selector in select

tLinkSel = 'table.table-main.only12.js-nrbanner-t a[href^="/tennis"]:has(strong)'
matchlist = set('https://www.betexplorer.com' a.get('href') for a in soup.select(tLinkSel))

although, I have to mention that I did not see any difference in the results when searching in dev tools, but this will limit the links to only those in the table.


Additional EDIT:

You can target specific dates with the data-dt attribute of the rows [tr]; for example, for Nov 2, 2022, you can set

tLinkSel = 'tr[data-dt^="2,11,2022,"] a[href^="/tennis"]:has(strong)'
  • Related