xpath - find all last descandant hyperlinks within a list of items-CodePudding

I have a dynamically generated page with a huge list which contains NESTED LINK ELEMENTS.

Sometime the list items contain ONE hyperlink and sometimes they contain TWO hyperlinks.
The depth/level of the nested links varies so it is different every time I refresh the page.
IMPORTANT: Within each list item at least one link has a link text. These are the links I want.

BUT: The parent element of the link text varies every time I refresh the page.

   <div >
        <div>
             <a href="https://www.testpage/user1">
        </div>
        <div>
             <a href="https://www.testpage/user2">
                  <span>
                        <div>user2</div>
                  </span>
             </a>
        </div>
   </div>


   <div >
        <div>
             <a href="https://www.testpage/user3">
                 <div>user3</div>
             </a>
        </div>
   </div>



   <div >
        <div>
             <div>
                  <a href="https://www.testpage/user4">
                       <span>
                             <span>user4</span>
                       </span>
                  </a>
             </div>
        </div>
   </div>


   <div >
        <div>
             <div>
                  <div>
                       <a href="https://www.testpage/user5" />
                  </div> 
             </div>
             <div>
                  <a href="https://www.testpage/user6">
                       <div>
                             <div>user6</div>
                       </div>
                  </a>
             </div>
        </div>
   </div>

The result should be a list with user2, user3, user4 and user6

I alredy tried div/a[last()] but this returns ALL 6 hyperlinks
And I tried (div/a)[last()] but this returns hyperlink 6 only

So my question is:

Which xpath is needed to get the LAST HYPERLINK-DESCENDANTS OF ALL FOUR ITEMS.
Or in other words: How to get the **HYPERLINKS WHERE THE HREF-ATRIBUTE EQUALS THE TEXT WITHIN THE LAST DESCENDANT ELEMENTS **

CodePudding user response：

Given the HTML:

<div >
    <div>
         <a href="https://www.testpage/user1">
    </div>
    <div>
         <a href="https://www.testpage/user2">
              <span>
                    <div>user2</div>
              </span>
         </a>
    </div>
</div>
<div >
    <div>
         <a href="https://www.testpage/user3">
             <div>user3</div>
         </a>
    </div>
</div>
<div >
    <div>
         <div>
              <a href="https://www.testpage/user4">
                   <span>
                         <span>user4</span>
                   </span>
              </a>
         </div>
    </div>
</div>
<div >
    <div>
         <div>
              <div>
                   <a href="https://www.testpage/user5" />
              </div> 
         </div>
         <div>
              <a href="https://www.testpage/user6">
                   <div>
                         <div>user6</div>
                   </div>
              </a>
         </div>
    </div>
</div>

To get the value of the href attributes of the second <a> elements having link text i.e. user2, user3, user4 and user6, you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use the following locator strategy:

Using XPATH:

print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='listitem']/div//div//a[.//self::div[starts-with(., 'user')] or .//self::span[starts-with(., 'user')]]")))])

Note: You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

CodePudding user response：

You mentioned that you want to get all the links containing texts i.e links from a elements containing child element (span or div) containing texts.
If so you can use the following XPath:

//div[@class='listitem']//a[@href and(text())]

If you want to get (and print) all these links with Selenium it can be done with the following loop:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 20)
links = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='listitem']//a[@href and(text())]")))
for link in links:
    print(link.get_attribute("href"))

//a[@href and(text())] means: element with a tag having href attribute (not specified the attribute value i.e. any value) and having a text (any text content)