Home > Software design >  AttributeError: 'NoneType' object has no attribute 'find' when scrapping an arra
AttributeError: 'NoneType' object has no attribute 'find' when scrapping an arra

Time:02-24

I have the following code:

from bs4 import BeautifulSoup
import requests

root = 'https://br.investing.com'
website = f'{root}/news/latest-news'

result = requests.get(website, headers={"User-Agent": "Mozilla/5.0"})
content = result.text
soup = BeautifulSoup(content, 'lxml')

box = soup.find('section', id='leftColumn')
links = [link['href'] for link in box.find_all('a', href=True)]

for link in links:
  result = requests.get(f'{root}/{link}', headers={"User-Agent": "Mozilla/5.0"})
  content = result.text
  soup = BeautifulSoup(content, 'lxml')

  box = soup.find('section', id='leftColumn')
  title = box.find('h1').get_text()

  with open('headlines.txt', 'w') as file:
    file.write(title)

I intend with this code scrape the URLs of news from a website, access each of these URLs, get its headers and write them on a text file. With this code, I'm just getting one header on the file and receiving AttributeError: 'NoneType' object has no attribute 'find'. What can be done about this?

CodePudding user response:

In your for loop, here: title = box.find('h1').get_text(), box is None (i.e NoneType)... which is why you're being told NoneType object has no attribute find

This is probably happening because at some point in the loop, this line: box = soup.find('section', id='leftColumn') returns None

If box returns None, the next line will throw an error.

You can fix this by checking if box is not None before calling find. So this:

box = soup.find('section', id='leftColumn')
title = box.find('h1').get_text()

will change to

box = soup.find('section', id='leftColumn')
if box is not None:
    title = box.find('h1').get_text()

EDIT:

The reason why you're seeing only one header is that you have -w here: with open('headlines.txt', 'w')

-w will overwrite the file. I don't understand the contents but I would guess the output is the last header

To fix: replace -w with -a. it will add "title" to the file content. You can read about it here: https://www.w3schools.com/python/python_file_write.asp

  • Related