Home > Software design >  Selenium WebDriver crashes when I try to get innerText of an element container
Selenium WebDriver crashes when I try to get innerText of an element container

Time:02-06

I am trying to get the 'message.spoilers-container''s 'innerText', but when I scroll up the webpage, the program crashes, and give me an error.

Code:

from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

def find_message_container(driver):
    try:
        elements = driver.execute_script("return document.querySelectorAll('.message.spoilers-container')")
        unique_texts = set()

        for element in elements:
            text = element.get_attribute("innerText")

            if text not in unique_texts:
                unique_texts.add(text)

            with open("unique_texts.txt", "w") as file:
                for text in unique_texts:
                    file.write("\n"   text   "\n")

    except NoSuchElementException as e:
        print('Could not find the given element container. The following exception was raised:\n', e)
        pass
    
    return unique_texts

Error:

Traceback (most recent call last):
  File "c:\~\Desktop\Project\file.py", line 11, in find_message_container
    text = element.get_attribute("innerText")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\~\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\remote\webelement.py", line 179, in get_attribute   
    attribute_value = self.parent.execute_script(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\~\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 506, in execute_script   
    return self.execute(command, {"script": script, "args": converted_args})["value"]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\~\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 444, in execute
    self.error_handler.check_response(response)
  File "C:\~\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 249, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: chrome=109.0.5414.120)

What could cause this problem? The website I am testing this on is Web Telegram. Everytime new chats is loaded by scrolling up, a new container appears.

Any help would be helpful, I tried with some wait statements and wait.until, but it did not work.

CodePudding user response:

The core exception is StaleElementReferenceException...

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

...which implies by the time document.querySelectorAll() completes obtaining the NodeList of all of the matching elements in the document, some elements turns stale as new chats are loaded within a new container. In short the reference to the elements changes within the DOM Tree.


Solution

A possible solution would be to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    elements = WebDriverWait(browser, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".message.spoilers-container")))
    
  • Using XPATH:

    elements = WebDriverWait(browser, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[@class='message spoilers-container']")))
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

CodePudding user response:

I don't have a Web Telegram account so I can't test this but I would change these things:

  1. The main issue is the StaleElementReferenceException. A stale element is an element that you assigned to a variable, the page changed, and then you try to perform a .click() or .text on the element. Once the page changes, the reference you had is gone... it now points to nothing. A quick code example of how this happens

    element = driver.find_element(locator) # got a reference
    # while doing stuff, page changes
    value = element.text # accessing the element using .text throws the exception
    

    To avoid this, you want to refetch the reference before accessing it

    element = driver.find_element(locator)
    # while doing stuff, page changes
    element = driver.find_element(locator) # refetch the element
    value = element.text
    

    In your case, this is happening because of the loop through the messages. You create your list before the loop so if the elements change while looping, the exception is thrown. The way to fix this is to refetch the elements within the loop.

    for element in driver.find_elements(...)
    

    One potentially big problem is that if you are in a fast moving chat with lots of new messages constantly, your script may not be able to keep up since the page DOM seems to change with each new message. NOTE: This is an assumption based on your comments.

  2. Prefer the native API instead of using driver.execute_script() to find elements. Replace

    elements = driver.execute_script("return document.querySelectorAll('.message.spoilers-container')")
    

    with

    elements = driver.find_element(By.CSS_SELECTOR('.message.spoilers-container'))
    
  3. Use .text instead of .get_attribute("innerText"). Replace

    text = element.get_attribute("innerText")
    

    with

    text = element.text
    
  4. Writing a file is a relatively slow operation. I would avoid writing until the loop is done.

  5. Why are you returning unique_texts if you've already written them to file?

Here's my rewrite of your code based on these suggestions

def find_message_container(driver):
    try:
        unique_texts = set()
        for element in driver.find_elements(By.CSS_SELECTOR('.message.spoilers-container')):
            message = element.text
            if message not in unique_texts:
                unique_texts.add(message)

        with open("unique_texts.txt", "w") as file:
            for text in unique_texts:
                file.write("\n"   text   "\n")

    except NoSuchElementException as e:
        print('Could not find the given element container. The following exception was raised:\n', e)
        pass
  • Related