Home > Back-end >  How to get the text before & after the hyperlink using Selenium and Xpath
How to get the text before & after the hyperlink using Selenium and Xpath

Time:05-05

I am trying to find cases using Xpath/Selenium where there is no white-space before the hyperlink.

e.g.

<p>Click on this<a href="#">link</a>to access the data</p>`

This renders as Click on thislinkto access the data

Problem : Locate all the <a> elements and test if they have white-space before and after

Is there any elegant way to get the text before/after of anchor? I am thinking of using XPath query such as //a/parent::* which returns <p> element but the <a> tag is not separated. Can I somehow get the immediate text before/after the anchor tag?

CodePudding user response:

Since you're using selenium, I'm assuming xpath 1.0.

This should select a elements that don't have a preceding or following space...

//a[substring(preceding-sibling::text()[1],string-length(preceding-sibling::text()[1]) )!=' ' or substring(following-sibling::text()[1],1,1)!=' ']

CodePudding user response:

You can try this

//a/..


Output:


Click on thislinkto access the data

Output-2

  //a/../a

link

CodePudding user response:

As per the HTML, there are two text nodes with text Click on this and to access the data and a <a> element within their parent

element.

<p>
    Click on this
    <a href="#">link</a>
    to access the data
</p>

Solution

To print text before the <a> element you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using Python, XPATH, childNodes and strip():

    print(driver.execute_script('return arguments[0].firstChild.textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[contains(., 'link')]//ancestor::p[1]")))).strip())
    
  • Using Python, XPATH, get_attribute() and splitlines():

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[contains(., 'link')]//ancestor::p[1]"))).get_attribute("innerHTML").splitlines()[1])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Related