I have been trying to scrape data from the following site:
CodePudding user response:
@CalGrace The page contains Shadow root. You can go through Shadow DOM in Selenium for more details.
The following code should work for Chrome browser. :
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
s=Service(ChromeDriverManager().install())
driver = webdriver.Chrome(options=chrome_options, service=s)
driver.get(
"https://developer.salesforce.com/docs/atlas.en-us.netzero_cloud_dev_guide.meta/netzero_cloud_dev_guide/sforce_api_objects_airtravelemssnfctr.htm#maincontent")
main_content = driver.find_element(By.CSS_SELECTOR, "main[id='maincontent'] doc-xml-content")
main_content_shadow_root = main_content.shadow_root
doc_content = main_content_shadow_root.find_element(By.CSS_SELECTOR, "doc-content")
doc_content_shadow_root = doc_content.shadow_root;
table = doc_content_shadow_root.find_element(By.CSS_SELECTOR, ".featureTable.sort_table")
print(table.tag_name)
CodePudding user response:
shadow_root
The shadow_root
attribute returns a shadow root of the element if there is one or an error. Only works from Chromium 96 onwards. Previous versions of Chromium based browsers will throw an assertion exception.
Solution
Using google-chrome v96 (and above) and selenium4 to access the Fields table, as the table element is within multiple #shadow-root (open) you can use the following locator strategies:
Code Block:
driver = webdriver.Chrome(service=s, options=options) driver.execute("get", {'url': 'https://developer.salesforce.com/docs/atlas.en-us.netzero_cloud_dev_guide.meta/netzero_cloud_dev_guide/sforce_api_objects_airtravelemssnfctr.htm#maincontent'}) shadow_host = driver.find_element(By.CSS_SELECTOR, 'doc-xml-content') shadow_root = shadow_host.shadow_root shadow_child = shadow_root.find_element(By.CSS_SELECTOR, 'doc-content') shadow_grand_child = shadow_child.shadow_root element = shadow_grand_child.find_element(By.CSS_SELECTOR, 'table.featureTable') print(element.get_attribute("outerHTML")) driver.quit()
Console Output:
<table summary=""> <thead align="left"> <tr> <th id="d51659e96">Field</th> <th id="d51659e99">Details</th> </tr> </thead> <tbody > <tr> <td headers="d51659e96" data-title="Field"><span >Ch4PsgrKmLongHaulInKgCo2e</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >double</dd> <dt >Properties</dt> <dd >Create, Filter, Nillable, Sort, Update</dd> <dt >Description</dt> <dd > The CH4 emissions per passenger-kilometer in CO2e from long-haul flights. </dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >Ch4PsgrKmMediumHaulInKgCo2e</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >double</dd> <dt >Properties</dt> <dd >Create, Filter, Nillable, Sort, Update</dd> <dt >Description</dt> <dd > The CH4 emissions per passenger-kilometer in CO2e from medium-haul flights. </dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >DistanceUnit</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >picklist</dd> <dt >Properties</dt> <dd >Create, Defaulted on create, Filter, Group, Nillable, Restricted picklist, Sort, Update</dd> <dt >Description</dt> <dd > The unit of measure for the distance. </dd> <dd >Possible values are: <ul > <li ><samp >Kilometers</samp></li> <li ><samp >Miles</samp></li> </ul> </dd> <dd >The default value is 'Kilometers'.</dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >EmissionFactorDataSource</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >textarea</dd> <dt >Properties</dt> <dd >Create, Nillable, Update</dd> <dt >Description</dt> <dd > The source of the emissions factor reference data. </dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >EmissionFactorUpdateYear</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >picklist</dd> <dt >Properties</dt> <dd >Create, Filter, Group, Nillable, Restricted picklist, Sort, Update</dd> <dt >Description</dt> <dd > The year in which this reference data for the emissions factor was most recently updated. </dd> <dd >Possible values are: <ul > <li ><samp >2000</samp></li> <li ><samp >2001</samp></li> <li ><samp >2002</samp></li> <li ><samp >2003</samp></li> <li ><samp >2004</samp></li> <li ><samp >2005</samp></li> <li ><samp >2006</samp></li> <li ><samp >2007</samp></li> <li ><samp >2008</samp></li> <li ><samp >2009</samp></li> <li ><samp >2010</samp></li> <li ><samp >2011</samp></li> <li ><samp >2012</samp></li> <li ><samp >2013</samp></li> <li ><samp >2014</samp></li> <li ><samp >2015</samp></li> <li ><samp >2016</samp></li> <li ><samp >2017</samp></li> <li ><samp >2018</samp></li> <li ><samp >2019</samp></li> <li ><samp >2020</samp></li> <li ><samp >2021</samp></li> <li ><samp >2022</samp></li> <li ><samp >2023</samp></li> <li ><samp >2024</samp></li> <li ><samp >2025</samp></li> <li ><samp >2026</samp></li> <li ><samp >2027</samp></li> <li ><samp >2028</samp></li> <li ><samp >2029</samp></li> <li ><samp >2030</samp></li> <li ><samp >2031</samp></li> <li ><samp >2032</samp></li> <li ><samp >2033</samp></li> <li ><samp >2034</samp></li> <li ><samp >2035</samp></li> <li ><samp >2036</samp></li> <li ><samp >2037</samp></li> <li ><samp >2038</samp></li> <li ><samp >2039</samp></li> <li ><samp >2040</samp></li> </ul> </dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >LastReferencedDate</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >dateTime</dd> <dt >Properties</dt> <dd >Filter, Nillable, Sort</dd> <dt >Description</dt> <dd >The timestamp for when the current user last viewed a record related to this record.</dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >LastViewedDate</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >dateTime</dd> <dt >Properties</dt> <dd >Filter, Nillable, Sort</dd> <dt >Description</dt> <dd >The timestamp for when the current user last viewed this record. If this value is null, this record might only have been referenced (LastReferencedDate) and not viewed. </dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >LongHaulMinimumDistance</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >double</dd> <dt >Properties</dt> <dd >Create, Filter, Nillable, Sort, Update</dd> <dt >Description</dt> <dd > The minimum distance for a long-haul flight that’s adjusted according to the short-haul or medium-haul distances. </dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >MediumHaulMaximumDistance</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >double</dd> <dt >Properties</dt> <dd >Create, Filter, Nillable, Sort, Update</dd> <dt >Description</dt> <dd > The maximum distance of a medium-haul flight. </dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >N2oPsgrKmLongHaulInKgCo2e</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >double</dd> <dt >Properties</dt> <dd >Create, Filter, Nillable, Sort, Update</dd> <dt >Description</dt> <dd > The N2O emissions per passenger-kilometer in CO2e from long-haul flights. </dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >N2oPsgrKmMediumHaulInKgCo2e</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >double</dd> <dt >Properties</dt> <dd >Create, Filter, Nillable, Sort, Update</dd> <dt >Description</dt> <dd > The N2O emissions per passenger-kilometer in CO2e from medium-haul flights.</dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >N2oPsgrKmShortHaulInKgCo2e</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >double</dd> <dt >Properties</dt> <dd >Create, Filter, Nillable, Sort, Update</dd> <dt >Description</dt> <dd > The N2O emissions per passenger-kilometer in CO2e from short-haul flights.</dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >N2oPsgrMileLongHaulInKgCo2e</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >double</dd> <dt >Properties</dt> <dd >Create, Filter, Nillable, Sort, Update</dd> <dt >Description</dt> <dd > The N2O emissions per passenger-mile in CO2e from long-haul flights. </dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >N2oPsgrMileMediumHaulInKgCo2e</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >double</dd> <dt >Properties</dt> <dd >Create, Filter, Nillable, Sort, Update</dd> <dt >Description</dt> <dd > The N2O emissions per passenger-mile in CO2e from medium-haul flights.</dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >N2oPsgrMileShortHaulInKgCo2e</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >double</dd> <dt >Properties</dt> <dd >Create, Filter, Nillable, Sort, Update</dd> <dt >Description</dt> <dd > The N2O emissions per passenger-mile in CO2e from short-haul flights.</dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >Name</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >string</dd> <dt >Properties</dt> <dd >Create, Filter, Group, idLookup, Sort, Update</dd> <dt >Description</dt> <dd >Name of the account.</dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >OwnerId</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >reference</dd> <dt >Properties</dt> <dd >Create, Defaulted on create, Filter, Group, Sort, Update</dd> <dt >Description</dt> <dd >The ID of the user who owns this record. </dd> <dd >This is a polymorphic relationship field.</dd> <dt >Relationship Name</dt> <dd >Owner</dd> <dt >Relationship Type</dt> <dd >Lookup</dd> <dt >Refers To</dt> <dd >Group, User</dd> </dl> </td> </tr> <tr> <td headers="d51659e96" data-title="Field"><span >ShortHaulMaximumDistance</span></td> <td headers="d51659e99" data-title="Details"> <dl > <dt >Type</dt> <dd >double</dd> <dt >Properties</dt> <dd >Create, Filter, Nillable, Sort, Update</dd> <dt >Description</dt> <dd > The maximum distance of a short-haul flight. </dd> </dl> </td> </tr> </tbody> </table>