Home > Blockchain >  xpath - find all last descandant hyperlinks within a list of items
xpath - find all last descandant hyperlinks within a list of items

Time:08-30

I have a dynamically generated page with a huge list which contains NESTED LINK ELEMENTS.

  • Sometime the list items contain ONE hyperlink and sometimes they contain TWO hyperlinks.

  • The depth/level of the nested links varies so it is different every time I refresh the page.

  • IMPORTANT: Within each list item at least one link has a link text. These are the links I want.

  • BUT: The parent element of the link text varies every time I refresh the page.

       <div >
            <div>
                 <a href="https://www.testpage/user1">
            </div>
            <div>
                 <a href="https://www.testpage/user2">
                      <span>
                            <div>user2</div>
                      </span>
                 </a>
            </div>
       </div>
    
    
       <div >
            <div>
                 <a href="https://www.testpage/user3">
                     <div>user3</div>
                 </a>
            </div>
       </div>
    
    
    
       <div >
            <div>
                 <div>
                      <a href="https://www.testpage/user4">
                           <span>
                                 <span>user4</span>
                           </span>
                      </a>
                 </div>
            </div>
       </div>
    
    
       <div >
            <div>
                 <div>
                      <div>
                           <a href="https://www.testpage/user5" />
                      </div> 
                 </div>
                 <div>
                      <a href="https://www.testpage/user6">
                           <div>
                                 <div>user6</div>
                           </div>
                      </a>
                 </div>
            </div>
       </div>
    

The result should be a list with user2, user3, user4 and user6

  • I alredy tried div/a[last()] but this returns ALL 6 hyperlinks
  • And I tried (div/a)[last()] but this returns hyperlink 6 only

So my question is:

  • Which xpath is needed to get the LAST HYPERLINK-DESCENDANTS OF ALL FOUR ITEMS.
  • Or in other words: How to get the **HYPERLINKS WHERE THE HREF-ATRIBUTE EQUALS THE TEXT WITHIN THE LAST DESCENDANT ELEMENTS **

CodePudding user response:

Given the HTML:

<div >
    <div>
         <a href="https://www.testpage/user1">
    </div>
    <div>
         <a href="https://www.testpage/user2">
              <span>
                    <div>user2</div>
              </span>
         </a>
    </div>
</div>
<div >
    <div>
         <a href="https://www.testpage/user3">
             <div>user3</div>
         </a>
    </div>
</div>
<div >
    <div>
         <div>
              <a href="https://www.testpage/user4">
                   <span>
                         <span>user4</span>
                   </span>
              </a>
         </div>
    </div>
</div>
<div >
    <div>
         <div>
              <div>
                   <a href="https://www.testpage/user5" />
              </div> 
         </div>
         <div>
              <a href="https://www.testpage/user6">
                   <div>
                         <div>user6</div>
                   </div>
              </a>
         </div>
    </div>
</div>

To get the value of the href attributes of the second <a> elements having link text i.e. user2, user3, user4 and user6, you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use the following locator strategy:

  • Using XPATH:

    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='listitem']/div//div//a[.//self::div[starts-with(., 'user')] or .//self::span[starts-with(., 'user')]]")))])
    
  • Note: You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

CodePudding user response:

You mentioned that you want to get all the links containing texts i.e links from a elements containing child element (span or div) containing texts.
If so you can use the following XPath:

//div[@class='listitem']//a[@href and(text())]

If you want to get (and print) all these links with Selenium it can be done with the following loop:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 20)
links = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='listitem']//a[@href and(text())]")))
for link in links:
    print(link.get_attribute("href"))

//a[@href and(text())] means: element with a tag having href attribute (not specified the attribute value i.e. any value) and having a text (any text content)

  • Related