Description of the situation: It is a script that scrolls in a frame in order to extract the information.
<ul>
<li> </li>
<li> </li>
<li> </li>
<li> </li>
<li> </li>
...
</ul>
The list length of about 30 items, when scrolling, no new items are added <li> </li>
, only updated. The structure of the DOM does not increase.
Explaining the problem:
When the script scrolls, it must extract all the elements of the <li> </li>
for each iteration because they are renewed.
Here is the logic of scrolling and extracting elements. The code I use:
SCROLL_PAUSE_TIME = 5
# Get scroll height
last_height = driver.execute_script("return document.querySelector('div[data-tid=\"pane-list-viewport\"]').scrollHeight;")
all_msgs_loaded = False
while not all_msgs_loaded:
li_elements: List[WebElement] = self._driver.find_elements(By.XPATH, "//li[@data-tid='pane-item']")
driver.execute_script("document.querySelector('li[data-tid=\"pane-item\"]').scrollIntoView();")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.querySelector('div[data-tid=\"pane-list-viewport\"]').scrollHeight;")
if new_height == last_height:
all_msgs_loaded = True
last_height = new_height
For each iteration li_elements receives about 30 WebElements. If i comment on the line with find_elements, the script works for hours without increasing the RAM consumption. I mention that I do not save anything in runtime, that I don't have an increase in consumption elsewhere.
Another way I used to get li_elements is through
self._driwer.execute_script ()
Example:
li_elements = (self._driver.execute_script(
"return document.querySelectorAll('li[data-tid=\"pane-item\"]');",
WebDriverWait(self._edge_driver, 20).until(
EC.visibility_of_element_located((By.XPATH, "//li[@data-tid='pane-item']")))
By both methods I get the same result that I have, but the RAM increase is the same. RAM grows indefinitely until TaskManager destroys the process on its own for security.
I analyzed the internal structure of these functions, but I did not find anything that could load the RAM.
Another modality would be find_elements_by_css_selector ()
, but inside it is called find_elements ()
.
I also tried different combinations with sleep (), but nothing helps, RAM does not decrease.
Can you please explain to me what is happening in reality, I do not understand why RAM consumption is increasing.
Can you tell me if there is another method of extracting the elements without consuming RAM?
CodePudding user response:
By any means find_elements() method of Selenium shouldn't be consuming so much of RAM. Most possibly it's the Browsing Context e.g. google-chrome which consumes more RAM while you scrollIntoView() incase the <li>
items gets updated through JavaScript or AJAX.
Without any visibility in the DOM Tree it would be difficult to predict the actual reason or any remediation. However, a similar discussion suggests to use some waits interms of time.sleep(n)
CodePudding user response:
Try getting just what you need instead of the full element:
lis = driver.execute_script("""
return [...document.querySelectorAll('li[data-tid="pane-item"]')].map(li => li.innerText)
""")
I can't tell what you're doing with them, but if you're adding elements to a big array, and there's enough of them, you will hit a RAM limit