I wrote the following code in order to scrape the text of the element <h3 class="h4 mb-10">Total nodes: 1,587</h3>
from https://blockchair.com/dogecoin/nodes.
#!/usr/bin/python3
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
path = "/usr/local/bin/chromedriver"
driver = webdriver.Chrome(path)
driver.get("https://blockchair.com/dogecoin/nodes")
def scraping_fnd():
try:
#nodes = driver.find_element_by_class_name("h4 mb-10")
#NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".h4 mb-10"}. There are 3 elements pf this class
#nodes = WebDriverWait(driver, 10).until(expected_conditions.presence_of_element_located((By.CLASS_NAME, "h4 mb-10")))#selenium.common.exceptions.TimeoutException: Message:
nodes = WebDriverWait(driver, 10).until(expected_conditions.presence_of_element_located((By.CSS_SELECTOR, ".h4 mb-10")))#selenium.common.exceptions.TimeoutException: Message:
nodes = nodes.text
print(nodes)
finally:
driver.quit()#Closes the tab even when return is executed
scraping_fnd()
I'm aware that there are perhaps less bloated options than selenium to scrape the target in question, yet the said code is just a snippet, a part of a more extensive script that relies on selenium for its other tasks. Thus let us limit the scope of the answers to selenium only.
Although there are three elements of the class "h4 mb-10"
on the page, I am unable to locate the element. When I call driver.find_element_by_class_name("h4 mb-10")
, I get:
Traceback (most recent call last):
File "./protocols.py", line 34, in <module>
scraping_fnd()
File "./protocols.py", line 20, in scraping_fnd
nodes = driver.find_element_by_class_name("h4 mb-10")#(f"//span[@title = \"{name}\"]")
File "/home/jerzy/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 564, in find_element_by_class_name
return self.find_element(by=By.CLASS_NAME, value=name)
File "/home/jerzy/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 976, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "/home/jerzy/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/home/jerzy/.local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".h4 mb-10"}
(Session info: chrome=90.0.4430.212)
XP
Applying waits, currently commented out in the snippet, was to no avail. I came across this question and so I tried calling WebDriverWait(driver, 10).until(expected_conditions.presence_of_element_located((By.CSS_SELECTOR, ".h4 mb-10")))
.
I got :
Traceback (most recent call last):
File "./protocols.py", line 33, in <module>
scraping_fnd()
File "./protocols.py", line 23, in scraping_fnd
nodes = WebDriverWait(driver, 10).until(expected_conditions.presence_of_element_located((By.CSS_SELECTOR, ".h4 mb-10")))#selenium.common.exceptions.TimeoutException: Message:
File "/home/jerzy/.local/lib/python3.8/site-packages/selenium/webdriver/support/wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
I have no clue what am I doing wrong. Is it doable to scrape the target with selenium without using Xpaths?
CodePudding user response:
If you want to go with css selector it would be
.h4.mb-10
You can go with class name
If either one uniquely identifies the element or xpath
//h3[@class="h4 mb-10"]
CodePudding user response:
Try the below (Note that selenium
is not involved in the solution)
import requests
from bs4 import BeautifulSoup
r = requests.get('https://blockchair.com/dogecoin/nodes')
soup = BeautifulSoup(r.content, 'html.parser')
lst = soup.find_all('h3',{"class": "h4 mb-10"})
lst = [h for h in lst if 'Total' in h.text]
print(lst[0].text)
output
Total nodes: 1,593