I am using Selenium in Python 3 to get the page source of a site that uses JavaScript. When I run it interactively in an iPython shell, it works as I expect it to. However, when the exact same script is executed non-interactively, the page source is not fully rendered (the JavaScript components aren't rendered). What could be the reason for this? I am running the exact same code on the exact same machine (a headless Linux server).
#!/usr/bin/python3
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
WINDOW_SIZE = "1920,1080"
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--window-size={0}".format(WINDOW_SIZE))
chrome_options.add_argument("--no-sandbox")
service = Service('/usr/local/bin/chromedriver')
driver = webdriver.Chrome(service=service, options=chrome_options)
driver.get("https://www.stakingrewards.com/staking/?page=1&sort=rank_ASC")
src = driver.page_source
# Check page source length
print(len(src))
# Quit all windows related to the driver instance
driver.quit()
The output from the iPython shell is 220101
, which is expected, while the output from the command line executed script ($ python script.py
) is 38265
. Thus, I am not effectively rendering the JavaScript components when I invoke the script from the command line. Why?!
CodePudding user response:
The problem is not with running it interactively or as a script.
In your code, you're not really giving any time for the driver to render all the elements, resulting in incomplete source code. It just happened to be that running it interactively was a little bit faster than running it as a script, resulting in larger page source length. However, I was able to get much larger page source length using Waits
(around 650k).
Waits can be used to wait for required elements to be visible/present etc. In your case I'm assuming it's the main table. The given code below waits for the table to be visible and then returns the page source.
Code snippet-
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
driver.get("https://www.stakingrewards.com/staking/?page=1&sort=rank_ASC")
try:
#waiting for table data to be visible
delay=20 #20 second delay
WebDriverWait(driver, delay).until(EC.visibility_of_element_located((By.CLASS_NAME, 'rt-tbody')))
print(len(driver.page_source))
#raises Exception if element is not visible within delay duration
except TimeoutException:
print("Timeout!!!")
driver.quit()