Home > Enterprise >  Webscraping - Beautifulsoup4 - Accessing indexed item in a find_all loop
Webscraping - Beautifulsoup4 - Accessing indexed item in a find_all loop

Time:05-27

How do I make it so that I can choose an item in the list in that for loop?

When I print it without brackets, I get the full list and every index seems to be the proper item that I need

for h3 in soup.find_all('h3', itemprop="name"):
    bookname = h3.a.text
    bookname = bookname.split('\n')
    print(bookname)

However, when I print it by specifying an index, whether it is inside the loop or outside, it returns "list index out of range"

for h3 in soup.find_all('h3', itemprop="name"):
    bookname = h3.a.text
    bookname = bookname.split('\n')
    print(bookname[2])

What's my problem here? How do I change my code so that I can scrape all the h3 names, yet at the same time be able to choose specific indexed h3 names when I want to?

Here's the entire code:

import requests
from bs4 import BeautifulSoup

source = requests.get("https://ca1lib.org/s/ginger") #gets the source of the site and returns it
soup = BeautifulSoup(source.text, 'html5lib')

for h3 in soup.find_all('h3', itemprop="name"):
    bookname = h3.a.text
    bookname = bookname.split('\n')
    print(bookname[2])

CodePudding user response:

At a first glance, assuming that your h3 element contains more book names ("book1" \n "book2" \n "book3"), your problem could be that certain h3 elements have less than 3 elements, so the bookname[2] part can't access an element from a shorter list. On the other hand, if your h3 element has only 1 item (h3 book1 h3), you are iterating all the h3 tags, so you are basically taking each one of them (so in your first iteration you'll have "h3 book1 h3", in your second iteration "h3 book2 h3"), in which case you should make a list with all the h3.a.text elements, then access the desired value. Hope this helps!

CodePudding user response:

I forgot to append. I figured it out.

Here's my final code:

import requests
from bs4 import BeautifulSoup

source = requests.get("https://ca1lib.org/s/ginger") #gets the source of the site and returns it
soup = BeautifulSoup(source.text, 'html.parser')

liste = []

for h3_tag in soup.find_all('h3', itemprop="name"):
    liste.append(h3_tag.a.text.split("\n"))
    #bookname = h3.a.text #string
    #bookname = bookname.split('\n') #becomes list
print(liste[5])
  • Related