Home > Net >  Why do I get an error when trying to access the first two columns in an HTML table?
Why do I get an error when trying to access the first two columns in an HTML table?

Time:12-21

import requests
from bs4 import BeautifulSoup

wiki = "https://en.wikipedia.org/wiki/List_of_Pixar_films"
website_url = requests.get(wiki).text
soup = BeautifulSoup(website_url, 'lxml')
table_class = "wikitable plainrowheaders sortable"
my_table = soup.find('table', {'class': table_class})


Film = []
release = []

for row in my_table.find_all('i')[0:]:
    Film_cell = row.find_all('a')[0]
    Film.append(Film_cell.text)
print(Film)

for row in my_table.find_all('td')[0:]:
    release = row.find_all('span')[:1]
    release.append(release.text)
print(release)

Output:

['Toy Story', "A Bug's Life", 'Toy Story 2', 'Monsters, Inc.',
'Finding Nemo', 'The Incredibles', 'Cars', 'Ratatouille', 'WALL-E',
'Up', 'Toy Story 3', 'Cars 2', 'Brave', 'Monsters University', 'Inside Out',
'The Good Dinosaur', 'Finding Dory', 'Cars 3', 'Coco', 'Incredibles 2',
'Toy Story 4', 'Onward', 'Soul', 'Luca', 'Turning Red', 'Lightyear']
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-223-6481bc092354> in <module>
      7 for row in my_table.find_all('td')[0:]:
      8     release = row.find_all('span')[:1]
----> 9     release.append(release.text)
     10 print(release)

AttributeError: 'list' object has no attribute 'text'

CodePudding user response:

for row in my_table.find_all('td')[0:]:
    release= row.find_all('span')[:1]
    release.append(release.text)
print(release)
  • my_table.find_all('td')[0:] is same as my_table.find_all('td')
  • row.find_all('span')[:1] is a list, maybe row.find_all('span')[0]
  • release= row.find_all('span')[:1], should use another variable

To get first two columns, not including the index column.

release = []
for row in my_table.find_all('td'):
    span = row.find_all('span')
    if span:
        release.append(span[0].text)
print(release)
[('Toy Story', 'November 22, 1995'), ("A Bug's Life", 'November 25, 1998'), ('Toy Story 2', 'November 24, 1999'), ('Monsters, Inc.', 'November 2, 2001'), ('Finding Nemo', 'May 30, 2003'), ('The Incredibles', 'November 5, 2004'), ('Cars', 'June 9, 2006'), ('Ratatouille', 'June 29, 2007'), ('WALL-E', 'June 27, 2008'), ('Up', 'May 29, 2009'), ('Toy Story 3', 'June 18, 2010'), ('Cars 2', 'June 24, 2011'), ('Brave', 'June 22, 2012'), ('Monsters University', 'June 21, 2013'), ('Inside Out', 'June 19, 2015'), ('The Good Dinosaur', 'November 25, 2015'), ('Finding Dory', 'June 17, 2016'), ('Cars 3', 'June 16, 2017'), ('Coco', 'November 22, 2017'), ('Incredibles 2', 'June 15, 2018'), ('Toy Story 4', 'June 21, 2019'), ('Onward', 'March 6, 2020'), ('Soul', 'December 25, 2020'), ('Luca', 'June 18, 2021'), ('Turning Red[1]', 'March 11, 2022[5]'), ('Lightyear[2]', 'June 17, 2022[5]'), ('TBA', 'June 16, 2023[8]'), ('TBA', 'March 1, 2024[4]'), ('TBA', 'June 14, 2024[4]')]

CodePudding user response:

The code release= row.find_all('span')[:1] generates a list which does not have a "text" parameter. You need to parse it further to get the "text" element, i.e. release.append(release[0].text) instead of release.append(release.text).

But this will also generate "index out of bound error" because many lists are empty in your loop.

Modified Code below:

import requests
from bs4 import BeautifulSoup
wiki = "https://en.wikipedia.org/wiki/List_of_Pixar_films"
website_url = requests.get(wiki).text
soup = BeautifulSoup(website_url,'lxml')
table_class = "wikitable plainrowheaders sortable"
my_table = soup.find('table',{'class':table_class})


Film = []
release = []

for row in my_table.find_all('i')[0:]:
    Film_cell = row.find_all('a')[0]
    Film.append(Film_cell.text)
print(Film)

new_list = []
for row in my_table.find_all('td')[0:]:
    release= row.find_all('span')[:1]    
    if len(release) > 0:
        new_list.append(release[0].text)
    print(new_list)
  • Related