Home > Blockchain >  Python webscraping - how to print "end of list" in case of error
Python webscraping - how to print "end of list" in case of error

Time:03-01

I am looking to pull the list of President's from a wikipedia page. The code works just fine to do that; however, after going down the list and pulling Biden, I get the following error code because there are no other names to pull. Is anyone aware of a a way that I can have it print 'End of list' once it recognizes that there are no other names to pull, instead of the error? Thanks!

Traceback (most recent call last): File "Filepath\WebScrapingMod6.py", line 11, in print(name.get_text('title')) AttributeError: 'NoneType' object has no attribute 'get_text'

import requests
from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
tb = soup.find('table', class_='wikitable')

for link in tb.find_all('b'):
    name = link.find('a')
    print(name.get_text('title'))

CodePudding user response:

I suggest looking up try and except statements!

import requests
from bs4 import BeautifulSoup
try:
    url =     "https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States"
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    tb = soup.find('table', class_='wikitable')

    for link in tb.find_all('b'):
        name = link.find('a')
        print(name.get_text('title'))
except AttributeError:
    print(“End of file.”)

CodePudding user response:

On the page you're scraping, there's a b tag with a value of 'Sources:'. That has no a child tag. You are not accounting for that situation in your code.

I suggest:

import requests
from bs4 import BeautifulSoup as BS

(r := requests.get('https://en.wikipedia.org/wiki/List_of_presidents_of_the_United_States')).raise_for_status()
soup = BS(r.text, 'lxml')

for b in soup.find('table', class_='wikitable sortable').find_all('b'):
    if (ba := b('a')):
        print(ba[0].text)
  • Related