Home > Mobile >  How to select elements without a class using beautifulsoup
How to select elements without a class using beautifulsoup

Time:08-12

scraping the Fbref website to get specific player info so that I can use that for further analysis. I have selected the table I want to scrape. The information I want is in <tr> tags without any class attributes. But the issue is that this table has many headers in <tr> tags that have a class name

import requests
from bs4 import BeautifulSoup
from time import sleep
url = "https://fbref.com/en/comps/9/2021-2022/stats/2021-2022-Premier-League-Stats"

response = requests.get(url).text.replace('<!--', '').replace('-->', '')

soup = BeautifulSoup(response, "html.parser")

I have selected the desired table I want to scrape. I want to select <tr> tags that don't have any class attribute because that's where the information I want is located.

players_table = soup.select("table#stats_standard tbody tr", class_ =None)

I have then looped through the players_table so that I can get each player's info like name, country, position, etc.

for player in players_table:
     player_name = player.find("td", attrs={"data-stat" : "player"}).a.text   
    print(player_name)
    sleep(2)

But now the problem is that my code will loop through the table and when it finds the <tr > tag, it tries to look for its <a> tag and then further look for the text in the <a> tag. But this specific <tr > tag doesn't have any <a> tags and that makes my code to break and get this error message 'NoneType' object has no attribute 'a' when I try to run it.

My code prints the names of the players untill it finds this <tr > tag with no <a> then it just fails & breaks. I have even tried to decompose or clear this <tr > tag, but it still doesn't work.

player.find(".thead").decompose()

So my question is how can I select only tags that don't have any class so that when my reaches tag, it just neglects it. I have actually tried doing that by using class_ = None when making the table

players_table = soup.select("table#stats_standard tbody tr", class_ =None)

But this didn't solve anything. I need your help on this, please.

CodePudding user response:

If you only wanna exclude the subheaders adjust your selector, that it only selects these <tr> without class .thead:

soup.select('table#stats_standard tbody tr:not(.thead)')

or more specific to the title of your question that do not have a class attribute:

soup.select('table#stats_standard tbody tr:not([class])')

Example

import requests
from bs4 import BeautifulSoup
from time import sleep
url = "https://fbref.com/en/comps/9/2021-2022/stats/2021-2022-Premier-League-Stats"

response = requests.get(url).text.replace('<!--', '').replace('-->', '')

soup = BeautifulSoup(response)

for player in soup.select('table#stats_standard tbody tr:not([class])'):
    player_name = player.find("td", attrs={"data-stat" : "player"}).a.text   
    print(player_name)

CodePudding user response:

Why not just let pandas parse that. Then you can do whatever you want with the table.

import requests
import pandas as pd
from bs4 import BeautifulSoup

import requests
from bs4 import BeautifulSoup
from time import sleep
url = "https://fbref.com/en/comps/9/2021-2022/stats/2021-2022-Premier-League-Stats"

response = requests.get(url).text.replace('<!--', '').replace('-->', '')
df = pd.read_html(response, header=1)[-1]
df = df[df['Rk'].ne('Rk')]

Output:

print(df)
      Rk                Player   Nation    Pos  ... xG xA npxG.1 npxG xA.1  Matches
0      1            Max Aarons  eng ENG     DF  ...  0.07   0.02      0.07  Matches
1      2             Che Adams  sct SCO     FW  ...  0.43   0.31      0.43  Matches
2      3       Rayan Aït Nouri   fr FRA     DF  ...  0.10   0.04      0.10  Matches
3      4       Kristoffer Ajer   no NOR     DF  ...  0.10   0.04      0.10  Matches
4      5            Nathan Aké   nl NED     DF  ...  0.16   0.11      0.16  Matches
..   ...                   ...      ...    ...  ...   ...    ...       ...      ...
562  542         Wilfried Zaha   ci CIV     FW  ...  0.46   0.13      0.29  Matches
563  543  Christoph Zimmermann   de GER     DF  ...  0.04   0.04      0.04  Matches
564  544   Oleksandr Zinchenko   ua UKR     DF  ...  0.21   0.04      0.21  Matches
565  545          Hakim Ziyech   ma MAR  FW,MF  ...  0.47   0.23      0.47  Matches
566  546            Kurt Zouma   fr FRA     DF  ...  0.04   0.04      0.04  Matches

[546 rows x 33 columns]

or

for player in df['Player']:
    print(player)
  • Related