I have been trying to scrape data from the following site using Selenium & Python: https://developer.salesforce.com/docs/atlas.en-us.netzero_cloud_dev_guide.meta/netzero_cloud_dev_guide/sforce_api_objects_airtravelemssnfctr.htm#maincontent.
I want to obtain the Fields table but nothing seems to be working. Anytime I try to get any element from the site I get the following error:
NoSuchElementException: Message: no such element: Unable to locate element
It has since been pointed out to me that the reason this is occurring is because the element I want to access is inside 2 #shadow-root (open)
. I am having trouble figuring out how to access this.
I am referencing this article: How to handle elements inside Shadow DOM from Selenium. However, I am having a hard time altering the code for my needs.
Does anyone know how I can access the element I need behind the Shadow DOM? Desperately need help with this as I cannot figure it out.
Attempting to use the code below but I can't figure out which elements I should be referencing in each root variable in the find_element_by_id function. If anyone has advice on how
driver = webdriver.Chrome()
def expand_shadow_element(element):
shadow_root = driver.execute_script('return arguments[0].shadowRoot', element)
return shadow_root
driver.get("chrome://settings")
root1 = driver.find_element_by_tag_name('settings-ui')
shadow_root1 = expand_shadow_element(root1)
root2 = shadow_root1.find_element_by_css_selector('[page-name="Settings"]')
shadow_root2 = expand_shadow_element(root2)
root3 = shadow_root2.find_element_by_id('search')
shadow_root3 = expand_shadow_element(root3)
search_button = shadow_root3.find_element_by_id("searchTerm")
search_button.click()
text_area = shadow_root3.find_element_by_id('searchInput')
text_area.send_keys("content settings")
root0 = shadow_root1.find_element_by_id('main')
shadow_root0_s = expand_shadow_element(root0)
root1_p = shadow_root0_s.find_element_by_css_selector('settings-basic-page')
shadow_root1_p = expand_shadow_element(root1_p)
root1_s = shadow_root1_p.find_element_by_css_selector('settings-privacy-page')
shadow_root1_s = expand_shadow_element(root1_s)
content_settings_div = shadow_root1_s.find_element_by_css_selector('#site-settings-subpage-trigger')
content_settings = content_settings_div.find_element_by_css_selector("button")
content_settings.click()
CodePudding user response:
@JenPal An example with ShadowRoot for the same website is below. This should help arrive at a solution in your case. Make sure your Selenium version is 4.x:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
s=Service(ChromeDriverManager().install())
driver = webdriver.Chrome(options=chrome_options, service=s)
driver.get(
"https://developer.salesforce.com/docs/atlas.en-us.netzero_cloud_dev_guide.meta/netzero_cloud_dev_guide/sforce_api_objects_airtravelemssnfctr.htm#maincontent")
main_content = driver.find_element(By.CSS_SELECTOR, "main[id='maincontent'] doc-xml-content")
main_content_shadow_root = main_content.shadow_root
doc_content = main_content_shadow_root.find_element(By.CSS_SELECTOR, "doc-content")
doc_content_shadow_root = doc_content.shadow_root;
table = doc_content_shadow_root.find_element(By.CSS_SELECTOR, ".featureTable.sort_table")
print(table.tag_name)
CodePudding user response:
The <table>
element with heading Fields is within multiple #shadow-root (open).
Solution
To extract the contents of the table you have to use shadowRoot.querySelector()
and you can use the following locator strategy:
Code Block:
driver = webdriver.Chrome(service=s, options=options) driver.execute("get", {'url': 'https://developer.salesforce.com/docs/atlas.en-us.netzero_cloud_dev_guide.meta/netzero_cloud_dev_guide/sforce_api_objects_airtravelemssnfctr.htm#maincontent'}) time.sleep(5) table_data = driver.execute_script("""return document.querySelector('doc-xml-content').shadowRoot.querySelector('doc-content').shadowRoot.querySelector('table.featureTable')""") print(pd.read_html(table_data.get_attribute("outerHTML"))) driver.quit()
Console Output:
[ Field Details 0 Ch4PsgrKmLongHaulInKgCo2e Type double Properties Create, Filter, Nill... 1 Ch4PsgrKmMediumHaulInKgCo2e Type double Properties Create, Filter, Nill... 2 Ch4PsgrKmShortHaulInKgCo2e Type double Properties Create, Filter, Nill... 3 Ch4PsgrMileLongHaulInKgCo2e Type double Properties Create, Filter, Nill... 4 Ch4PsgrMileMediumHaulInKgCo2e Type double Properties Create, Filter, Nill... 5 Ch4PsgrMileShortHaulInKgCo2e Type double Properties Create, Filter, Nill... 6 Co2PsgrKmLongHaulInKg Type double Properties Create, Filter, Nill... 7 Co2PsgrKmMediumHaulInKg Type double Properties Create, Filter, Nill... 8 Co2PsgrKmShortHaulInKg Type double Properties Create, Filter, Nill... 9 Co2PsgrMileLongHaulInKg Type double Properties Create, Filter, Nill... 10 Co2PsgrMileMediumHaulInKg Type double Properties Create, Filter, Nill... 11 Co2PsgrMileShortHaulInKg Type double Properties Create, Filter, Nill... 12 DistanceUnit Type picklist Properties Create, Defaulted ... 13 EmissionFactorDataSource Type textarea Properties Create, Nillable, ... 14 EmissionFactorUpdateYear Type picklist Properties Create, Filter, Gr... 15 LastReferencedDate Type dateTime Properties Filter, Nillable, ... 16 LastViewedDate Type dateTime Properties Filter, Nillable, ... 17 LongHaulMinimumDistance Type double Properties Create, Filter, Nill... 18 MediumHaulMaximumDistance Type double Properties Create, Filter, Nill... 19 N2oPsgrKmLongHaulInKgCo2e Type double Properties Create, Filter, Nill... 20 N2oPsgrKmMediumHaulInKgCo2e Type double Properties Create, Filter, Nill... 21 N2oPsgrKmShortHaulInKgCo2e Type double Properties Create, Filter, Nill... 22 N2oPsgrMileLongHaulInKgCo2e Type double Properties Create, Filter, Nill... 23 N2oPsgrMileMediumHaulInKgCo2e Type double Properties Create, Filter, Nill... 24 N2oPsgrMileShortHaulInKgCo2e Type double Properties Create, Filter, Nill... 25 Name Type string Properties Create, Filter, Grou... 26 OwnerId Type reference Properties Create, Defaulted... 27 ShortHaulMaximumDistance Type double Properties Create, Filter, Nill...]