Home > Software engineering >  BeautifulSoup and Lists
BeautifulSoup and Lists

Time:12-31

I am attempting to use beautifulsoup to look through and request each url in a txt file. So far I am able to scrape the first link for what I seek, progressing to the next url I hit an error.

This is the error I keep getting:

AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

from bs4 import BeautifulSoup as bs
import requests
import constants as c

file = open(c.fvtxt)
read = file.readlines()
res = []
DOMAIN = c.vatican_domain
pdf = []


def get_soup(url):
    return bs(requests.get(url).text, 'html.parser')


for link in read:
    bs = get_soup(link)
    res.append(bs)
    soup = bs.find('div', {'class': 'headerpdf'})
    pdff = soup.find('a')
    li = pdff.get('href')
    surl = f"{DOMAIN}{li}"
    pdf.append(f"{surl}\n")
    print(pdf)

CodePudding user response:

It's your variable name confuses the Python interpreter, you cannot have the same name as a function and a variable at the same time, in your case 'bs'.

It should work fine if you rename the variable bs to parsed_text or something else but bs.

for link in read:
    parsed_text = get_soup(link)
    res.append(parsed_text)
    soup = parsed_text.find('div', {'class': 'headerpdf'})
    pdff = soup.find('a')
    li = pdff.get('href')
    print(li)
    surl = f"{DOMAIN}{li}"
    pdf.append(f"{surl}\n")
    print(pdf)

The result:

enter image description here

  • Related