I am trying to retrieve the code of a web page. In particular the code of a web page (https://registry.verra.org/app/projectDetail/VCS/812) which is generated inside the apx-root tag which is in the body (line 18 in the source code).
For the moment I tried with selenium to select the element by the name of the apx-root tag to finally display the generated HMTL code but without success.
from selenium import webdriver
url = "https://registry.verra.org/app/projectDetail/VCS/812"
driver = webdriver.Chrome()
driver.get(url)
elem = driver.find_elements_by_css_selector('apx-root')
print(elem[0].get_attribute('innerHTML'))
Anybody may help? Thanks a lot.
EDIT:
I had to wait for te page to load to be able to access the code inside the tag.
url = "https://registry.verra.org/app/projectDetail/VCS/812"
driver.get(url)
delay = 5
try:
WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.Id, 'an_element i need')))
except TimeoutException:
print("Loading took too much time!")
apx_root = driver.find_element(By.XPATH, '/html/body/apx-root')
html = apx_root.get_attribute("innerHTML")
CodePudding user response:
If I'm understanding you correctly, you want to get the innerHTML contents of the apx-root tag.
Rather than using css_selector, let's use XPATH.
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://registry.verra.org/app/projectDetail/VCS/812")
#Use XPATH to find the apx-root tag
apx_root = driver.find_element(By.XPATH, '/html/body/apx-root')
#If you want the inner HTML not including apx-root
page_inner_HTML = apx_root.get_attribute("innerHTML")
#If you want the outer HTML including apx-root
page_outer_HTML = apx_root.get_attribute("outerHTML")
Also, using find_elements_by method of Selenium is deprecated, and the find_element method with By should be used instead.