The find_elements () function in Selenium consumes a lot of RAM-CodePudding

Description of the situation: It is a script that scrolls in a frame in order to extract the information.

<ul>

<li> </li>
<li> </li>
<li> </li>
<li> </li>
<li> </li>
...
</ul>

The list length of about 30 items, when scrolling, no new items are added <li> </li>, only updated. The structure of the DOM does not increase.

Explaining the problem: When the script scrolls, it must extract all the elements of the <li> </li> for each iteration because they are renewed.

Here is the logic of scrolling and extracting elements. The code I use:

SCROLL_PAUSE_TIME = 5

# Get scroll height
last_height = driver.execute_script("return document.querySelector('div[data-tid=\"pane-list-viewport\"]').scrollHeight;")

all_msgs_loaded = False

while not all_msgs_loaded:

    li_elements: List[WebElement] = self._driver.find_elements(By.XPATH, "//li[@data-tid='pane-item']")

    driver.execute_script("document.querySelector('li[data-tid=\"pane-item\"]').scrollIntoView();")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.querySelector('div[data-tid=\"pane-list-viewport\"]').scrollHeight;")
    if new_height == last_height:
        all_msgs_loaded = True
    last_height = new_height

For each iteration li_elements receives about 30 WebElements. If i comment on the line with find_elements, the script works for hours without increasing the RAM consumption. I mention that I do not save anything in runtime, that I don't have an increase in consumption elsewhere.

Another way I used to get li_elements is through self._driwer.execute_script ()

Example:

li_elements = (self._driver.execute_script(
                 "return document.querySelectorAll('li[data-tid=\"pane-item\"]');",
                 WebDriverWait(self._edge_driver, 20).until(
                     EC.visibility_of_element_located((By.XPATH, "//li[@data-tid='pane-item']")))

By both methods I get the same result that I have, but the RAM increase is the same. RAM grows indefinitely until TaskManager destroys the process on its own for security.

I analyzed the internal structure of these functions, but I did not find anything that could load the RAM. Another modality would be find_elements_by_css_selector (), but inside it is called find_elements ().

I also tried different combinations with sleep (), but nothing helps, RAM does not decrease.

Can you please explain to me what is happening in reality, I do not understand why RAM consumption is increasing.

Can you tell me if there is another method of extracting the elements without consuming RAM?

CodePudding user response：

By any means find_elements() method of Selenium shouldn't be consuming so much of RAM. Most possibly it's the Browsing Context e.g. google-chrome which consumes more RAM while you scrollIntoView() incase the <li> items gets updated through JavaScript or AJAX.

Without any visibility in the DOM Tree it would be difficult to predict the actual reason or any remediation. However, a similar discussion suggests to use some waits interms of time.sleep(n)

CodePudding user response：

Try getting just what you need instead of the full element:

lis = driver.execute_script("""
  return [...document.querySelectorAll('li[data-tid="pane-item"]')].map(li => li.innerText)
""")

I can't tell what you're doing with them, but if you're adding elements to a big array, and there's enough of them, you will hit a RAM limit