Home > Net >  Unable to locate elements using requests and BeautifulSoup
Unable to locate elements using requests and BeautifulSoup

Time:10-05

I am writing a script in Python using the modules 'requests' and 'BeautifulSoup' to scrape results from football matches found in the links from the following page:

https://www.premierleague.com/results?co=1&se=363&cl=-1

The task consists of two steps (taking the first match, Arsenal against Brighton, as an example):

  1. Extract and navigate to the href "https://www.premierleague.com/match/59266" found in the element:
    div data-template-rendered data-href.

  2. Navigate or to the "Stats"-tab and extracting the information found in the element:
    tbody class = "matchCentreStatsContainer".

I have already tried things like

page = requests.get("https://www.premierleague.com/match/59266")
soup = BeautifulSoup(page.text, "html.parser")
soup.findAll("div", {"class" : "matchCentreStatsContainer"})

but I am not able to locate any of the elements in step 1) or 2) (empty list is returned).

CodePudding user response:

Instead of this:

soup.findAll("div", {"class" : "matchCentreStatsContainer"})

Use this

soup.findAll({"class" : "matchCentreStatsContainer"})

It will work.

CodePudding user response:

In this case the problem is simply that you are looking for the wrong thing. There is no <div class="matchCentreStatsContainer"> on that page, that's a <tbody> so it doesn't match. If you want the div, do:

divs = soup.find_all("div", class_="statsSection")

Otherwise search for the tbodys:

soup.find_all("tbody", class_="matchCentreStatsContainer")

Incidentally the Right Way (TM) to match classes is with class_, which takes either a list or a string (for a single class). This was added to bs4 a while back, but the old syntax is still floating around a lot.

Do note your first url as posted here is invalid: it needs a http: or https: before it.

  • Related