How can I access href that is inside a data-field, using selenium (Python)?-CodePudding

I'm using selenium for web scraping, and I have the following HTML:

<a data-field="---"  target="---" href="https://www.example.com/0000/">
        <div >
            <span >

How can I access the href link and click on it? I used the following and didn't get results, would be great to understand why:

browser.find_element(By.PARTIAL_LINK_TEXT, "https://www.example.com")

browser.find_element(By.XPATH,"//a[contains(text(),'https://www.example.com')]")

Thanks!

Edit: The page I'm working on is the LinkedIn interests page (companies that I follow). You can find it on: https://www.linkedin.com/in/yourusername/details/interests/?detailScreenTabIndex=1

For each company I follow, there is an HTML:

<a data-field="active_tab_companies_interests"  target="_self" href="https://www.linkedin.com/company/1016/">
        <div >
            <span >
              <span aria-hidden="true"><!---->GE Healthcare<!----></span><span ><!---->GE Healthcare<!----></span>
            </span>
<!----><!----><!---->        </div>
<!---->          <span >
            <span aria-hidden="true"><!---->1,851,945 followers<!----></span><span ><!---->1,851,945 followers<!----></span>
          </span>
<!---->      </a>

I want to find href, in my example: "https://www.linkedin.com/company/1016/"

The code I wrote (with the help of the comments):

# log in
browser.get("https://www.linkedin.com")
username = browser.find_element(By.ID,"session_key")
username.send_keys("youremail")
password = browser.find_element(By.ID,"session_password")
password.send_keys("yourpassword")
login_button = browser.find_element(By.CLASS_NAME, "sign-in-form__submit-button")
login_button.click()


# companies I follow on Linkedin
browser.get("https://www.linkedin.com/in/username/details/interests/?detailScreenTabIndex=1")
# find all company links
wait = WebDriverWait(browser, 20)
company_page = browser.find_elements(By.XPATH,"//a[contains(@href,'https://www.linkedin.com/company/')]")


for x in range (len(company_page)):
    print(company_page[x].text)

The output for "GE healthcare" (from the HTML snippet) is: GE Healthcare GE Healthcare 1,852,718 followers 1,852,718 followers

and not the href link that I'm looking for. I don't understand why it finds these texts and not the link. Thanks!

CodePudding user response：

https://www.example.com/0000/ is not a text attribute content. It is a value of href attribute. This is why both you locators are wrong.
Please try this:

browser.find_element(By.XPATH,"//a[contains(@href,'https://www.example.com')]")

Adding a .click() will probably click on that element, as following:

browser.find_element(By.XPATH,"//a[contains(@href,'https://www.example.com')]").click()

You may probably will need to add a delay to wait for the element to be clickable. In this case WebDriverWait expected conditions is the right way to do it, as following:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(browser, 20)

wait.until(EC.element_to_be_clickable((By.XPATH, "//a[contains(@href,'https://www.example.com')]"))).click()