I have been trying to scrape a webpage and found something odd. When I try Python selenium webdriver's find_element_by_xpath
without time.sleep
, I get nothing from the command. But if I add time.sleep
, I suddenly get the information I intend to get.
I first notice this pattern, when I first run the code without time.sleep
I get nothing. But I get the result when I run the same code one more time. So I tried adding a short break, and suddenly the code worked perfectly.
Here is the code without time.sleep
driver.get(link)
info = driver.find_element_by_xpath('//*[@id="page-number"]').text
print info
Here is the one with time.sleep
driver.get(link)
time.sleep(1)
info = driver.find_element_by_xpath('//*[@id="page-number"]').text
print info
I understand I am supposed to provide an actual website address to get the best answer. But I didn't want to reveal which website I am trying to web scrape.
Could someone explain to me theoretically why this might happen?
CodePudding user response:
When you are using .sleep(), you are effectively pausing the code for x seconds. Since having it present is allowing your code to run, the webpage may not be loading as fast as the computer is attempting to process.
.sleep() works well for known time discrepancies but you may want to look at using selenium's explicit wait function where it can wait for x seconds until a specific element "appears" regardless of the reason before timing out. This makes it so that you don't have to code in a hard sleep time when searching for each element unless you know that their is a specific amount of time it needs to wait. See below link.
https://www.geeksforgeeks.org/explicit-waits-in-selenium-python/
CodePudding user response:
There could be any number of reasons why a sleep helps find the element. Selenium blocks code progress while it waits for the page to finish loading (browser to return document.readyState
equals complete
). Once the page load is complete, there may be any number of processes still running on the page that prevent portions of the page from completely loading.
NOTE: Using sleep is a bad practice. You should instead use WebDriverWait
to wait for the element to be in the desired state. In the case of your sample code, you would use
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver.get(link)
info = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "page-number")).text
print info