I am having a terrible time getting text from shadow-root. I found several docs and it looks right to me, except i always get:
Traceback (most recent call last): File "C:\python\vttest.py", line 29, in print(driver.execute_script("return document.querySelector('vt-ui-shell').shadowRoot.querySelector('url-view').shadowRoot.querySelector('vt-ui-main-generic-report').shadowRoot.querySelector('vt-ui-url-card').shadowRoot.querySelector('vt-ui-generic-card').shadowRoot.querySelector('p')").text) File "C:\Users\barberion.NATJ\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 634, in execute_script return self.execute(command, { File "C:\Users\barberion.NATJ\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute self.error_handler.check_response(response) File "C:\Users\barberion.NATJ\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.JavascriptException: Message: javascript error: Cannot read properties of null (reading 'shadowRoot') (Session info: chrome=96.0.4664.93)
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver import DesiredCapabilities # necessary in headless mode, need this library to accept ssl
from selenium.webdriver import Chrome
from selenium.webdriver.support.ui import WebDriverWait
#from selenium.webdriver.support.select import Select
capabilities = DesiredCapabilities.CHROME.copy()
capabilities['acceptSslCerts'] = True
capabilities['acceptInsecureCerts'] = True
opts = Options()
#opts.headless = True # comment this out to get out of headless
#opts.add_argument("--window-size=1400x1400")
opts.add_argument("--enable-javascript")
opts.add_argument("--start-maximized")
opts.add_argument('--disable-gpu')
searchvt=input("Enter URL: ")
driver = Chrome(options=opts,desired_capabilities=capabilities,executable_path='C:/python/chromedriver.exe')
driver.get('https://www.virustotal.com/gui/home/url')
url='//*[@id="view-container"]/home-view'
time.sleep(.5)
driver.find_element_by_xpath(url).send_keys(searchvt)
time.sleep(.5)
driver.find_element_by_xpath(url).send_keys(Keys.RETURN)
driver.maximize_window()
driver.set_page_load_timeout(5)
time.sleep(3)
print(driver.execute_script("return document.querySelector('vt-ui-shell').shadowRoot.querySelector('url-view').shadowRoot.querySelector('vt-ui-main-generic-report').shadowRoot.querySelector('vt-ui-url-card').shadowRoot.querySelector('vt-ui-generic-card').shadowRoot.querySelector('p')").text)
my error must be somewhere in the print :( any help is most appreciated!
Edit: To be more specific, I want to submit an URL or domain to virustotal, and then read the result text on top of page. If you use MSN.com the text I want is on top where it says "No security vendors flagged this URL as malicious"
CodePudding user response:
If you update to Selenium 4.0 and use a Chromium browser v96 you can use the new shadow_root
property in Python Selenium and avoid using JavaScript entirely.
This is a working example:
driver.get('http://watir.com/examples/shadow_dom.html')
shadow_host = driver.find_element(By.CSS_SELECTOR, '#shadow_host')
shadow_root = shadow_host.shadow_root
shadow_content = shadow_root.find_element(By.CSS_SELECTOR, '#shadow_content')
assert shadow_content.text == 'some text'
Source: https://titusfortner.com/2021/11/22/shadow-dom-selenium.html
CodePudding user response:
My shadow DOMS were just out of order. once i modified print to:
print(driver.execute_script("return document.querySelector('url-view').shadowRoot.querySelector('vt-ui-url-card').shadowRoot.querySelector('vt-ui-generic-card').querySelector('p')").text)
everything functioned as expected.