Home > Net >  How can I get html code from a webpage which is generated in apx-root tags?
How can I get html code from a webpage which is generated in apx-root tags?

Time:10-17

I am trying to retrieve the code of a web page. In particular the code of a web page (https://registry.verra.org/app/projectDetail/VCS/812) which is generated inside the apx-root tag which is in the body (line 18 in the source code).

For the moment I tried with selenium to select the element by the name of the apx-root tag to finally display the generated HMTL code but without success.

from selenium import webdriver

url = "https://registry.verra.org/app/projectDetail/VCS/812"
driver = webdriver.Chrome()
driver.get(url)
elem = driver.find_elements_by_css_selector('apx-root')
print(elem[0].get_attribute('innerHTML'))

Anybody may help? Thanks a lot.

EDIT:

I had to wait for te page to load to be able to access the code inside the tag.

url = "https://registry.verra.org/app/projectDetail/VCS/812"
driver.get(url)
delay = 5
try:
    WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.Id, 'an_element i need')))
except TimeoutException:
    print("Loading took too much time!")
apx_root = driver.find_element(By.XPATH, '/html/body/apx-root')
html = apx_root.get_attribute("innerHTML")

CodePudding user response:

If I'm understanding you correctly, you want to get the innerHTML contents of the apx-root tag.

Rather than using css_selector, let's use XPATH.

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://registry.verra.org/app/projectDetail/VCS/812")

#Use XPATH to find the apx-root tag
apx_root = driver.find_element(By.XPATH, '/html/body/apx-root')

#If you want the inner HTML not including apx-root
page_inner_HTML = apx_root.get_attribute("innerHTML")

#If you want the outer HTML including apx-root
page_outer_HTML = apx_root.get_attribute("outerHTML")

Also, using find_elements_by method of Selenium is deprecated, and the find_element method with By should be used instead.

  • Related