I am trying to scrape from a website we own tile, link and abstracts of articles in a search engine. I was earlier trying to use google sheet for this, but as this is a dynamic website I was encouraged to try with selenium and python. However I am getting nowhere. I am trying to scrape content from https://resources.norrag.org/categories/591,595 and wish to return the title and links of two case studies.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
s=Service('C:/Users/xxxx/Downloads/chromedriver_win32/chromedriver.exe')
browser = webdriver.Chrome(service=s)
url='https://resources.norrag.org/categories/591,595'
browser.get(url)
element = driver.find_element("xpath", '//div[@id="article_search_results"]//a')
print(element)
driver.close()
here is the error message
> --------------------------------------------------------------------------- NoSuchElementException Traceback (most recent call
> last) Input In [8], in <cell line: 10>()
> 6 url='https://resources.norrag.org/categories/591,595'
> 7 driver.get(url)
> ---> 10 element = driver.find_element("xpath", '//div[@id="article_search_results"]//a')
> 12 print(element)
> 13 driver.close()
>
> File
> ~\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py:857,
> in WebDriver.find_element(self, by, value)
> 854 by = By.CSS_SELECTOR
> 855 value = '[name="%s"]' % value
> --> 857 return self.execute(Command.FIND_ELEMENT, {
> 858 'using': by,
> 859 'value': value})['value']
>
> File
> ~\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py:435,
> in WebDriver.execute(self, driver_command, params)
> 433 response = self.command_executor.execute(driver_command, params)
> 434 if response:
> --> 435 self.error_handler.check_response(response)
> 436 response['value'] = self._unwrap_value(
> 437 response.get('value', None))
> 438 return response
>
> File
> ~\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py:247,
> in ErrorHandler.check_response(self, response)
> 245 alert_text = value['alert'].get('text')
> 246 raise exception_class(message, screen, stacktrace, alert_text) # type: ignore[call-arg] # mypy is not smart enough here
> --> 247 raise exception_class(message, screen, stacktrace)
>
> NoSuchElementException: Message: no such element: Unable to locate
> element:
> {"method":"xpath","selector":"//div[@id="article_search_results"]//a"}
> (Session info: chrome=103.0.5060.114) Stacktrace: Backtrace: Ordinal0
> [0x00575FD3 2187219] Ordinal0 [0x0050E6D1 1763025] Ordinal0
> [0x00423E78 802424] Ordinal0 [0x00451C10 990224] Ordinal0
> [0x00451EAB 990891] Ordinal0 [0x0047EC92 1174674] Ordinal0
> [0x0046CBD4 1100756] Ordinal0 [0x0047CFC2 1167298] Ordinal0
> [0x0046C9A6 1100198] Ordinal0 [0x00446F80 946048] Ordinal0
> [0x00447E76 949878] GetHandleVerifier [0x008190C2 2721218]
> GetHandleVerifier [0x0080AAF0 2662384] GetHandleVerifier
> [0x0060137A 526458] GetHandleVerifier [0x00600416 522518] Ordinal0
> [0x00514EAB 1789611] Ordinal0 [0x005197A8 1808296] Ordinal0
> [0x00519895 1808533] Ordinal0 [0x005226C1 1844929]
> BaseThreadInitThunk [0x76B5FA29 25]
> RtlGetAppContainerNamedObjectPath [0x77007A9E 286]
> RtlGetAppContainerNamedObjectPath [0x77007A6E 238]
CodePudding user response:
Looking at the source code using the inspection tool you can see that the two links have the class library-document-summary
So searching for these elements and returning their text and href attribute should work:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
s=Service('C:/Users/xxxx/Downloads/chromedriver_win32/chromedriver.exe')
driver = webdriver.Chrome(service=s)
url='https://resources.norrag.org/categories/591,595'
driver.get(url)
elements = driver.find_elements(By.XPATH, '//a[@]')
for e in elements:
print(e.get_attribute("href"))
print(e.text)
yields
https://resources.norrag.org/resource/696/towards-better-skills-development-in-the-vietnam-2018-general-education-curriculum
Towards Better Skills Development in the Vietnam 2018 General Education Curriculum
https://resources.norrag.org/resource/577/vietnam-national-education-for-all-efa-action-plan-2003-2015
Vietnam National Education for All (EFA) Action Plan 2003 - 2015