Home > database >  How to use selenium to scrape from multiple page clicks
How to use selenium to scrape from multiple page clicks

Time:08-24

I am writing a selenium script that will get all filenames in every directory on a website. My approach to that is to make a list of directory objects and .click() every directory in the list one by one to access all filenames. The problem I face is Selenium does not allow me to click on the next directory after the 1st. The following code is my approach...

folders = driver.find_elements_by_class_name("directory")

for folder in folders:
    folder.click()
    time.sleep(2)
    # click below is to navigate back to root directory
    driver.find_element_by_xpath('//*[@id="default-layout"]/div[1]/div/div/div[1]/div[1]/nav/ol/a').click()
    time.sleep(2)

With the above code, I get the following error when Selenium tries to click on the 2nd directory in the list... selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

CodePudding user response:

By opening the link and going to another page the previously collected links become stale.
To overcome this problem you will need to collect the folders list again each time you are coming from the opened page back to the main page.
So, your code can be as following:

folders = driver.find_elements_by_class_name("directory")

for index, folder in enumerate(folders):
    folders[index].click()
    #do what you need to do on the opened page
    #then get pack to the main page
    #and collect the `folders` list again with
    time.sleep(2)
    folders = driver.find_elements_by_class_name("directory")

CodePudding user response:

This means that the element you are trying to click isn't in the DOM anymore.

A possible solution is to use WebDriverWait something like this:

from selenium.webdriver.support.ui import WebDriverWait
secs = 2
def waitUntilFound(driver):
    element = driver.find_element_by_xpath('//*[@id="default-layout"]/div[1]/div/div/div[1]/div[1]/nav/ol/a')
    if element:
        return element
    else:
        return False
element = WebDriverWait(driver, secs).until(waitUntilFound)
  • Related