Home > Back-end >  Extract partial text from element link with python/selenium
Extract partial text from element link with python/selenium

Time:03-23

In the below HTML, my goal is to return zzde7e35d-8d9d-4763-95d2-9198684abb12

<div class = container>    
    <a  data-type="patch" data-disable-with="Waiting" href="/market/opening/zzde7e35d-8d9d-4763-95d2-9198684abb12">Yes</a>
</div>

The problem is, I can't even seem to locate the URL within the div

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

link = example.url
driver.get(link)
URL = driver.find_element_by_xpath('//a[contains(@href,"market")]')
print(URL)

Printing the above, I seem to get a bunch of random characters unrelated to the HTML at all, let alone the URL in question.

If it simplifies the issue, the number of characters that are returned will always be the same length, is indexing an easy work around?

CodePudding user response:

If you want to get the href you need to use get_attribute('href') this will give you /market/opening/zzde7e35d-8d9d-4763-95d2-9198684abb12 and then split() this and you will get the last element.

link = example.url
driver.get(link)
URL = driver.find_element_by_xpath('//a[contains(@href,"market")]')
print(URL.get_attribute('href').split("/")[-1])

Output:

zzde7e35d-8d9d-4763-95d2-9198684abb12

CodePudding user response:

You are possibly missing a delay.
Instead of

URL = driver.find_element_by_xpath('//a[contains(@href,"market")]')

Try using

from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

URL = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//a[contains(@href,"market")]'))).get_attribute("href")
print(URL)

Also you will have to extract the href attribute value from the returned web element object as shown in the code.
In case this still not worked check if the element you are trying to access inside iframe etc. Or maybe the locator is not unique etc.

CodePudding user response:

To print the partial value of the href attribute i.e. zzde7e35d-8d9d-4763-95d2-9198684abb12 you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using LINK_TEXT:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.LINK_TEXT, "Yes"))).get_attribute("href").split("/")[3])
    
  • Using CSS_SELECTOR:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a.Blue-Button[data-type='patch'][data-disable-with='Waiting'][href*='market']"))).get_attribute("href").split("/")[3])
    
  • Using XPATH:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[@class='Blue-Button' and @data-type='patch'][@data-disable-with='Waiting' and contains(@href, 'market')]"))).get_attribute("href").split("/")[-1])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant detailed discussion in Find div aria label starting with certain text and then extract

  • Related