AttributeError: 'NoneType' object has no attribute 'find' Web Scraping Python-CodePudding

I am working on an office project to get data to check active status on different websites but whenever I want to get data sometimes it shows none and sometimes it shows this Attribute error, I follow youtube videos steps but still get this error. help, please.

//Python Code

from bs4 import BeautifulSoup
import requests

html_text = requests.get(
    "https://www.mintscan.io/cosmos/validators/cosmosvaloper1we6knm8qartmmh2r0qfpsz6pq0s7emv3e0meuw").text
soup = BeautifulSoup(html_text, 'lxml')
status = soup.find('div', {'class': "ValidatorInfo_statusBadge__PBIGr"})
para = status.find('p').text
print(para)

CodePudding user response：

The url is dynamic meaning data is populated by javascript. So you need automation tool something like selenium.

from bs4 import BeautifulSoup
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager


url = 'https://www.mintscan.io/cosmos/validators/cosmosvaloper1we6knm8qartmmh2r0qfpsz6pq0s7emv3e0meuw'

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
time.sleep(8)
driver.get(url)
time.sleep(10)

soup = BeautifulSoup(driver.page_source, 'lxml')
#driver.close()

status = soup.find('div', {'class': "ValidatorInfo_statusBadge__PBIGr"})
para = status.find('p').text
print(para)

Output:

Active

CodePudding user response：

You have the most common problem - modern pages use JavaScript to add elements but requests/BeautifulSoup can't run JavaScript.

So soup.find('div',...) gives None instead expected element and later it makes problem with None.find('p')

You may use Selenium to control real web browser which can run JavaScript.

from selenium import webdriver
#from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#from selenium.common.exceptions import NoSuchElementException, TimeoutException
#from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.firefox import GeckoDriverManager

url = "https://www.mintscan.io/cosmos/validators/cosmosvaloper1we6knm8qartmmh2r0qfpsz6pq0s7emv3e0meuw"

#driver = webdriver.Chrome(executable_path=ChromeDriverManager().install())
driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())
driver.get(url)

#status = driver.find_element(By.XPATH, '//div[@]')
wait = WebDriverWait(driver, 10)
status = wait.until(EC.visibility_of_element_located((By.XPATH, '//div[@]')))
print(status.text)

Eventually you should check if page gives some API to get data.

You may also use DevTools (tab: Network) to check if JavaScript reads data from some URL and you may try to use this URL with requests. It could work faster than with Selenium but server may detect script/bot and block it.

JavaScript usually get data as JSON so it may not need to scrape HTML with BeautifulSoup

CodePudding user response：

The error message you are getting is pretty clear and self-explanatory.

AttributeError: 'NoneType' object has no attribute 'find'

Whenever you use the .find attribute you are assuming the variable you are trying to find the tags from contains something. But sometimes it has nothing in it represented by the python None type.

Example: para = status.find('p') You are assuming the previous line successfully extracted the status from the soup variable. But sometimes it doesn't find the status so the value of the status variable is None.

So the line para = status.find('p') will result in something like para = None.find('p') and hence the error.

To avoid this, you should check the variable to see if it got something before trying to find