I try to scrape a website and get values out of a table using Python. This goes well until I want to grap the value only (so without the html).
I try to get the value out of the field by using the following code:
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import requests
req = Request('https://www.formula1.com/en/results.html/2022/drivers.html', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage,'html.parser')
drivers = soup.find('table',class_='resultsarchive-table').find_all('tr')
for driver in drivers:
rank = driver.find('td', class_='dark')
first = driver.find('span',class_='hide-for-tablet')
last = driver.find('span',class_='hide-for-mobile')
print (rank)
When I use .text or .get_text() I get the error AttributeError: 'NoneType' object has no attribute while the code above contains values.
What do I do incorrect?
CodePudding user response:
Issu here is that you also grab the row with table headers that do not contain any <td>
. But you can simply slice them:
for driver in drivers[1:]:
rank = driver.find('td', class_='dark').text
first = driver.find('span',class_='hide-for-tablet').text
last = driver.find('span',class_='hide-for-mobile').text
print (rank)
or select more specific for example with css selectors
:
drivers = soup.select('table.resultsarchive-table tr:has(td)')
for driver in drivers:
rank = driver.find('td', class_='dark').text
first = driver.find('span',class_='hide-for-tablet').text
last = driver.find('span',class_='hide-for-mobile').text
print (rank)