Home > database >  Python Selenium select all the href from a main <div>
Python Selenium select all the href from a main <div>

Time:01-24

I am currently trying to get the href out of the following web page structure:

<div style="something> # THIS IS THE MAIN DIV I CAN GET
    <div > # First ROW sub-div under the main div
        <div > # SUB-SUB-DIV
            <a class=egaiegeigaegeigaegge", href="link_I_need">Text</a> # First HREF
        <div > # SUB-SUB-DIV
            <a class=egaegegaegaeg", href="link_I_need">Text</a> # Second HREF
        <div > # SUB-SUB-DIV
            <a class=arhrharhrahrah", href="link_I_need">Text</a> # Third HREF

    <div > # Second ROW subdiv under the main div
        <div > # SUB=SUB-DIV
            <a class=arhahrhahr", href="link_I_need">Text</a> # First HREF
        <div > # SUB-SUB-DIV
            <a class=eagregargreg", href="link_I_need">Text</a> # Second HREF
        <div > # SUB-SUB-DIV
            <a class=aegaegregrege", href="link_I_need">Text</a> # Third HREF
        ...
        ...
</div>

Using Python Selenium and ChromeDriver I can read the main div "something":

main_elem = browser.find_element(By.XPATH, "/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]")

Now, from here I am struggling using correctly Selenium to get all the links under href for all the sub-sub-div.

Do you have any idea on how I can easily get those? Thank you

PS: I can see that the first sub-sub-div has the following xpath:

/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]/div[1]

Then the second:

/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]/div[2]

and so on while the second row sub-sub-div xpath is:

/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[2]/div[1]

so there's div[2] rather div[1] and so on.

CodePudding user response:

Once you have the main (parent) element you can get all the child elements containing href attribute and get their values, as following:

children = main_elem.find_elements(By.XPATH, ".//a[href]")
for child in children:
    href = child.get_attribute("href")
    print(href)

CodePudding user response:

To extract the values of all the href attributes you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[style='something'] div div>a")))])
    
  • Using XPATH:

    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@style='something']//div//div/a")))])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Related