Home > front end >  AttributeError when using soup.find_all
AttributeError when using soup.find_all

Time:07-09

I was trying to build a web-scraper for data collection for a research project at uni. However, I am not able to scrape the whole website, as there seems to be a problem with soup.find_all...

This is what I've come up with so far:

from bs4 import BeautifulSoup 
import requests
from csv import writer

url= "https://pubmed.ncbi.nlm.nih.gov/?term=("spontaneous intracranial hypotension"[All Fields] OR "spontaneous cerebrospinal fluid leak"[All Fields] OR "cerebrospinal fluid hypovolemia"[All Fields] OR "cerebrospinal fluid hypovolemia syndrome"[All Fields] OR "Hypoliquorrhea"[All Fields] OR "Spontaneous spinal cerebrospinal fluid leak"[All Fields]) NOT "letter to the editor"[All Fields]&filter=dates.1000/1/1-2022/3/31&filter=lang.english&ac=no&format=abstract&sort=date&size=200"
page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all('article', class_="article-overview")

with open('disstest.csv', 'w', encoding= 'utf8', newline='') as f:
    thewriter = writer(f)
    header = ['Herkunftsland', 'Journal', 'Anzahl Zitationen']
    thewriter.writerow(header)

    for list in lists:
        herkunftsland = lists.find('ul', class_="item-list").text.replace('\n','')
        journal = lists.find('div', class_="article-source").text.replace('\n', '')
        zitationen = lists.find('li', class_="references-count").text.replace('\n', '')
        info = [herkunftsland, journal, zitationen]
        thewriter.writerow(info)

I am getting the following messages:

Traceback (most recent call last):  
File "/Users/***/Documents/Test/scrape.py", line 17, in <module>     
herkunftsland = lists.find('ul', class_="item-list").text.replace('\n','') 
File"/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/bs4/element.py", line 2289, in __getattr__     
raise AttributeError( 
AttributeError: ResultSet object has no attribute 'find'. 
You're probably treating a list of elements like a single element. 
Did you call find_all() when you meant to call find()?

CodePudding user response:

It looks like you made a mistake and use the lists list to search, but you should use _list

for _list in lists:
    herkunftsland = _list.find('ul', class_="item-list").text.replace('\n', '')
    journal = _list.find('div', class_="article-source").text.replace('\n', '')
    zitationen = _list.find('li', class_="references-count").text.replace('\n', '')
    info = [herkunftsland, journal, zitationen]
    thewriter.writerow(info)

CodePudding user response:

As mentioned by @Charls Ken you used the wrong variable lists to extract your data and you should also avoid using reserved keywords like list.

Would also recommend to check if elements are available before calling methods on them, to avoid AttributeErrors.

for _list in lists:
    herkunftsland = e.text.replace('\n','') if (e:= _list.find('ul', class_="item-list")) else None
    journal = e.text.replace('\n','').strip() if (e:= _list.find('div', class_="article-source")) else None
    zitationen = e.text.replace('\n','').strip() if (e:= _list.find('li', class_="references-count")) else None
    info = [herkunftsland, journal, zitationen] 

Note: This uses walrus operator that requires Python 3.8 or later to work.

To go without walrus operator:

journal = _list.find('div', class_="article-source").text.replace('\n','').strip() if _list.find('div', class_="article-source") else None
  • Related