I'm using selenium for web scraping, and I have the following HTML:
<a data-field="---" target="---" href="https://www.example.com/0000/">
<div >
<span >
How can I access the href link and click on it? I used the following and didn't get results, would be great to understand why:
browser.find_element(By.PARTIAL_LINK_TEXT, "https://www.example.com")
browser.find_element(By.XPATH,"//a[contains(text(),'https://www.example.com')]")
Thanks!
Edit: The page I'm working on is the LinkedIn interests page (companies that I follow). You can find it on: https://www.linkedin.com/in/yourusername/details/interests/?detailScreenTabIndex=1
For each company I follow, there is an HTML:
<a data-field="active_tab_companies_interests" target="_self" href="https://www.linkedin.com/company/1016/">
<div >
<span >
<span aria-hidden="true"><!---->GE Healthcare<!----></span><span ><!---->GE Healthcare<!----></span>
</span>
<!----><!----><!----> </div>
<!----> <span >
<span aria-hidden="true"><!---->1,851,945 followers<!----></span><span ><!---->1,851,945 followers<!----></span>
</span>
<!----> </a>
I want to find href, in my example: "https://www.linkedin.com/company/1016/"
The code I wrote (with the help of the comments):
# log in
browser.get("https://www.linkedin.com")
username = browser.find_element(By.ID,"session_key")
username.send_keys("youremail")
password = browser.find_element(By.ID,"session_password")
password.send_keys("yourpassword")
login_button = browser.find_element(By.CLASS_NAME, "sign-in-form__submit-button")
login_button.click()
# companies I follow on Linkedin
browser.get("https://www.linkedin.com/in/username/details/interests/?detailScreenTabIndex=1")
# find all company links
wait = WebDriverWait(browser, 20)
company_page = browser.find_elements(By.XPATH,"//a[contains(@href,'https://www.linkedin.com/company/')]")
for x in range (len(company_page)):
print(company_page[x].text)
The output for "GE healthcare" (from the HTML snippet) is: GE Healthcare GE Healthcare 1,852,718 followers 1,852,718 followers
and not the href link that I'm looking for. I don't understand why it finds these texts and not the link. Thanks!
CodePudding user response:
https://www.example.com/0000/
is not a text attribute content. It is a value of href
attribute. This is why both you locators are wrong.
Please try this:
browser.find_element(By.XPATH,"//a[contains(@href,'https://www.example.com')]")
Adding a .click()
will probably click on that element, as following:
browser.find_element(By.XPATH,"//a[contains(@href,'https://www.example.com')]").click()
You may probably will need to add a delay to wait for the element to be clickable. In this case WebDriverWait
expected conditions is the right way to do it, as following:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(browser, 20)
wait.until(EC.element_to_be_clickable((By.XPATH, "//a[contains(@href,'https://www.example.com')]"))).click()