I am looking to pull the list of President's from a wikipedia page. The code works just fine to do that; however, after going down the list and pulling Biden, I get the following error code because there are no other names to pull. Is anyone aware of a a way that I can have it print 'End of list' once it recognizes that there are no other names to pull, instead of the error? Thanks!
Traceback (most recent call last): File "Filepath\WebScrapingMod6.py", line 11, in print(name.get_text('title')) AttributeError: 'NoneType' object has no attribute 'get_text'
import requests
from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
tb = soup.find('table', class_='wikitable')
for link in tb.find_all('b'):
name = link.find('a')
print(name.get_text('title'))
CodePudding user response:
I suggest looking up try and except statements!
import requests
from bs4 import BeautifulSoup
try:
url = "https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
tb = soup.find('table', class_='wikitable')
for link in tb.find_all('b'):
name = link.find('a')
print(name.get_text('title'))
except AttributeError:
print(“End of file.”)
CodePudding user response:
On the page you're scraping, there's a b tag with a value of 'Sources:'. That has no a child tag. You are not accounting for that situation in your code.
I suggest:
import requests
from bs4 import BeautifulSoup as BS
(r := requests.get('https://en.wikipedia.org/wiki/List_of_presidents_of_the_United_States')).raise_for_status()
soup = BS(r.text, 'lxml')
for b in soup.find('table', class_='wikitable sortable').find_all('b'):
if (ba := b('a')):
print(ba[0].text)