I am currently trying to get the href
out of the following web page structure:
<div style="something> # THIS IS THE MAIN DIV I CAN GET
<div > # First ROW sub-div under the main div
<div > # SUB-SUB-DIV
<a class=egaiegeigaegeigaegge", href="link_I_need">Text</a> # First HREF
<div > # SUB-SUB-DIV
<a class=egaegegaegaeg", href="link_I_need">Text</a> # Second HREF
<div > # SUB-SUB-DIV
<a class=arhrharhrahrah", href="link_I_need">Text</a> # Third HREF
<div > # Second ROW subdiv under the main div
<div > # SUB=SUB-DIV
<a class=arhahrhahr", href="link_I_need">Text</a> # First HREF
<div > # SUB-SUB-DIV
<a class=eagregargreg", href="link_I_need">Text</a> # Second HREF
<div > # SUB-SUB-DIV
<a class=aegaegregrege", href="link_I_need">Text</a> # Third HREF
...
...
</div>
Using Python Selenium and ChromeDriver I can read the main div "something"
:
main_elem = browser.find_element(By.XPATH, "/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]")
Now, from here I am struggling using correctly Selenium to get all the links under href
for all the sub-sub-div.
Do you have any idea on how I can easily get those? Thank you
PS: I can see that the first sub-sub-div has the following xpath:
/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]/div[1]
Then the second:
/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]/div[2]
and so on while the second row sub-sub-div xpath
is:
/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[2]/div[1]
so there's div[2]
rather div[1]
and so on.
CodePudding user response:
Once you have the main (parent) element you can get all the child elements containing href
attribute and get their values, as following:
children = main_elem.find_elements(By.XPATH, ".//a[href]")
for child in children:
href = child.get_attribute("href")
print(href)
CodePudding user response:
To extract the values of all the href
attributes you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[style='something'] div div>a")))])
Using XPATH:
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@style='something']//div//div/a")))])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC