driver = webdriver.Chrome(service=s)
url="https://fourminutebooks.com/book-summaries/"
driver.get(url)
page_tabs = driver.find_elements(By.CSS_SELECTOR, "a[class='post_title w4pl_post_title']")
#html = driver.find_elements(By.CSS_SELECTOR,"header[class='entry-header page-header']")
length_page_tabs = len(page_tabs)
in_length = len(page_tabs)
for i in range(length_page_tabs):
ran = random.randint(0,in_length)
page_tabs[ran].click()
driver.execute_script("window.history.go(-1)")
time.sleep(10)
#need to get page source of html and then open it to a new file, extract what I want and add it to the email
I am trying to click one of the links, get the html code, email it to myself, and then go back a page and repeat. However after clicking the first random link, the code stops working and instead I get this error
CodePudding user response:
You have to be very careful, when you put some elements collection to the variable, and going to iterate and perform some actions.
page_tabs = driver.find_elements...
All the elements in this case are cached, and each web browser action of navigate to another page, refrech the page, etc. will make all of these cached elements stale. This means they bacame like out-of-date and not possible to interact them any more.
So, to avoid stale element reference errors, you have to prevent any page reloads, or just refresh the elements every time after the page state has been changed.
CodePudding user response:
StaleElementReferenceException
StaleElementReferenceException is a type of WebDriverException which is thrown when a reference to an element have gone stale
, i.e. the element no longer appears on the HTML DOM of the page.
Some of the possible causes of StaleElementReferenceException include:
- You are no longer on the same page, or the page may have refreshed since the element was last located.
- The element may have been removed and re-added to the DOM Tree, since it was located. Such as an element being relocated. This can happen typically with a javascript framework when values are updated and the node is rebuilt.
- Element may have been inside an iframe or another context which was refreshed.
This usecase
In your usecase, you have created a list of webelement i.e. page_tabs using the locator strategy:
page_tabs = driver.find_elements(By.CSS_SELECTOR, "a[class='post_title w4pl_post_title']")
Next within the loop whenever you invoke click on page_tabs[ran]
you are redirected to a new page, where the elements within the list page_tabs
becomes stale and new elements are loaded.
Moving forward when you invoke driver.execute_script("window.history.go(-1)")
you are moving back to the main page where the elements of page_tabs
were present and they reload again. At this point of time, the list page_tabs
still continues to hold the webelements of the previous search, which have now become stale. Hence during the second iteration you face StaleElementReferenceException
Solution
In your usecase to avoid StaleElementReferenceException as the desired elements are <A>
tag so instead of saving the elements you can store the href
attributes in a list and invoke get(href)
as follows:
driver.get("https://fourminutebooks.com/book-summaries/")
hrefs = [my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[class='post_title w4pl_post_title']")))]
for href in hrefs:
driver.get(href)
print("Placeholder to perform the desired operations on the respective page")
driver.quit()
References
You can find a couple of relevant detailed discussions in: