I am working on an office project to get data to check active status on different websites but whenever I want to get data sometimes it shows none and sometimes it shows this Attribute error, I follow youtube videos steps but still get this error. help, please.
//Python Code
from bs4 import BeautifulSoup
import requests
html_text = requests.get(
"https://www.mintscan.io/cosmos/validators/cosmosvaloper1we6knm8qartmmh2r0qfpsz6pq0s7emv3e0meuw").text
soup = BeautifulSoup(html_text, 'lxml')
status = soup.find('div', {'class': "ValidatorInfo_statusBadge__PBIGr"})
para = status.find('p').text
print(para)
CodePudding user response:
The url is dynamic meaning data is populated by javascript. So you need automation tool something like selenium.
from bs4 import BeautifulSoup
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
url = 'https://www.mintscan.io/cosmos/validators/cosmosvaloper1we6knm8qartmmh2r0qfpsz6pq0s7emv3e0meuw'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
time.sleep(8)
driver.get(url)
time.sleep(10)
soup = BeautifulSoup(driver.page_source, 'lxml')
#driver.close()
status = soup.find('div', {'class': "ValidatorInfo_statusBadge__PBIGr"})
para = status.find('p').text
print(para)
Output:
Active
CodePudding user response:
You have the most common problem - modern pages use JavaScript
to add elements but requests
/BeautifulSoup
can't run JavaScript.
So soup.find('div',...)
gives None
instead expected element and later it makes problem with None.find('p')
You may use Selenium to control real web browser which can run JavaScript
.
from selenium import webdriver
#from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#from selenium.common.exceptions import NoSuchElementException, TimeoutException
#from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.firefox import GeckoDriverManager
url = "https://www.mintscan.io/cosmos/validators/cosmosvaloper1we6knm8qartmmh2r0qfpsz6pq0s7emv3e0meuw"
#driver = webdriver.Chrome(executable_path=ChromeDriverManager().install())
driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())
driver.get(url)
#status = driver.find_element(By.XPATH, '//div[@]')
wait = WebDriverWait(driver, 10)
status = wait.until(EC.visibility_of_element_located((By.XPATH, '//div[@]')))
print(status.text)
Eventually you should check if page gives some API to get data.
You may also use DevTools
(tab: Network
) to check if JavaScript
reads data from some URL and you may try to use this URL with requests
. It could work faster than with Selenium but server may detect script/bot and block it.
JavaScript usually get data as JSON so it may not need to scrape HTML with BeautifulSoup
CodePudding user response:
The error message you are getting is pretty clear and self-explanatory.
AttributeError: 'NoneType' object has no attribute 'find'
Whenever you use the .find
attribute you are assuming the variable you are trying to find the tags from contains something. But sometimes it has nothing in it represented by the python None
type.
Example: para = status.find('p')
You are assuming the previous line successfully extracted the status from the soup variable. But sometimes it doesn't find the status so the value of the status
variable is None
.
So the line para = status.find('p')
will result in something like
para = None.find('p')
and hence the error.
To avoid this, you should check the variable to see if it got something before trying to find