I have a dynamically generated page with a huge list which contains NESTED LINK ELEMENTS.
Sometime the list items contain ONE hyperlink and sometimes they contain TWO hyperlinks.
The depth/level of the nested links varies so it is different every time I refresh the page.
IMPORTANT: Within each list item at least one link has a link text. These are the links I want.
BUT: The parent element of the link text varies every time I refresh the page.
<div > <div> <a href="https://www.testpage/user1"> </div> <div> <a href="https://www.testpage/user2"> <span> <div>user2</div> </span> </a> </div> </div> <div > <div> <a href="https://www.testpage/user3"> <div>user3</div> </a> </div> </div> <div > <div> <div> <a href="https://www.testpage/user4"> <span> <span>user4</span> </span> </a> </div> </div> </div> <div > <div> <div> <div> <a href="https://www.testpage/user5" /> </div> </div> <div> <a href="https://www.testpage/user6"> <div> <div>user6</div> </div> </a> </div> </div> </div>
The result should be a list with user2, user3, user4 and user6
- I alredy tried div/a[last()] but this returns ALL 6 hyperlinks
- And I tried (div/a)[last()] but this returns hyperlink 6 only
So my question is:
- Which xpath is needed to get the LAST HYPERLINK-DESCENDANTS OF ALL FOUR ITEMS.
- Or in other words: How to get the **HYPERLINKS WHERE THE HREF-ATRIBUTE EQUALS THE TEXT WITHIN THE LAST DESCENDANT ELEMENTS **
CodePudding user response:
Given the HTML:
<div >
<div>
<a href="https://www.testpage/user1">
</div>
<div>
<a href="https://www.testpage/user2">
<span>
<div>user2</div>
</span>
</a>
</div>
</div>
<div >
<div>
<a href="https://www.testpage/user3">
<div>user3</div>
</a>
</div>
</div>
<div >
<div>
<div>
<a href="https://www.testpage/user4">
<span>
<span>user4</span>
</span>
</a>
</div>
</div>
</div>
<div >
<div>
<div>
<div>
<a href="https://www.testpage/user5" />
</div>
</div>
<div>
<a href="https://www.testpage/user6">
<div>
<div>user6</div>
</div>
</a>
</div>
</div>
</div>
To get the value of the href
attributes of the second <a>
elements having link text i.e. user2, user3, user4 and user6, you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use the following locator strategy:
Using XPATH:
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='listitem']/div//div//a[.//self::div[starts-with(., 'user')] or .//self::span[starts-with(., 'user')]]")))])
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
CodePudding user response:
You mentioned that you want to get all the links containing texts i.e links from a
elements containing child element (span
or div
) containing texts.
If so you can use the following XPath:
//div[@class='listitem']//a[@href and(text())]
If you want to get (and print) all these links with Selenium it can be done with the following loop:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 20)
links = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='listitem']//a[@href and(text())]")))
for link in links:
print(link.get_attribute("href"))
//a[@href and(text())]
means: element with a
tag having href
attribute (not specified the attribute value i.e. any value) and having a text (any text content)