Home > Software engineering >  Multiple errors when scraping premier league tables
Multiple errors when scraping premier league tables

Time:04-24

I am learning web-scraping.

I succeeded scraping top youtubers ranking with this as reference.

I am using the same logic to scrape the PL ranking, but having two issues:

  1. it is only collecting up to 5th place.
  2. it is getting only the first place for the result
  3. and then, getting attribute error:

error

    from bs4 import BeautifulSoup
    import requests
    import csv


    url = 'https://www.premierleague.com/tables'
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    
    standings = soup.find('div', attrs={'data-ui-tab': 'First Team'}).find_all('tr')[1:]
    print(standings)
    
    file = open("pl_standings.csv", 'w')
    writer = csv.writer(file)
    
    writer.writerow(['position', 'club_name', 'points'])
    
    for standing in standings:
        position = standing.find('span', attrs={'class': 'value'}).text.strip()
        club_name = standing.find('span', {'class': 'long'}).text
        points = standing.find('td', {'class': 'points'}).text
    
        print(position, club_name, points)
    
        writer.writerow([position, club_name, points])
    
    file.close()

CodePudding user response:

The issue is that html.parser doesn't parse the page correctly (try using lxml parser). Also, there get every second <tr> to get correct results:

import requests
from bs4 import BeautifulSoup


url = "https://www.premierleague.com/tables"
page = requests.get(url)
soup = BeautifulSoup(page.content, "lxml") # <-- use lxml

standings = soup.find("div", attrs={"data-ui-tab": "First Team"}).find_all(
    "tr"
)[1::2]  # <-- get every second <tr>

for standing in standings:
    position = standing.find("span", attrs={"class": "value"}).text.strip()
    club_name = standing.find("span", {"class": "long"}).text
    points = standing.find("td", {"class": "points"}).text
    print(position, club_name, points)

Prints:

1 Manchester City 77
2 Liverpool 76
3 Chelsea 62
4 Tottenham Hotspur 57
5 Arsenal 57
6 Manchester United 54
7 West Ham United 52
8 Wolverhampton Wanderers 49
9 Leicester City 41
10 Brighton and Hove Albion 40
11 Newcastle United 40
12 Brentford 39
13 Southampton 39
14 Crystal Palace 37
15 Aston Villa 36
16 Leeds United 33
17 Everton 29
18 Burnley 28
19 Watford 22
20 Norwich City 21
  • Related