Find href on web page-CodePudding

I apologize if this seems like an easy fix but I don't understand why it is not working. I am looking for and trying to click the link:

<a href="#/documents/2077"

From the URL: https://species-registry.canada.ca/index-en.html#/documents?documentTypeId=18&sortBy=documentTypeSort&sortDirection=asc&pageSize=10&keywords=Victoria's Owl-clover

From a starting point of that URL. I have tried a few things including the following: #1

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT,"COSEWIC-Assessment-and-status-report")))

and

appraisal_html = driver.find_element_by_partial_link_text("COSEWIC-Assessment-and-status-report")

And #2

soup = bs(req.text,'html.parser')
for link in soup.find_all('a'):`
print(link.get('href'))`

Among other things. Keeping in mind that this is a generalized search in the sense that the species name will change every time I make this search, everything else should remain similar.

The second attempt is straight from the beautiful soup documentation and finds a whole bunch of links like the ones under the menu tab etc but not the href I am looking for.

The first attempt for some reason just times out without finding the partial text I input. Maybe this is because that is the text on the page and not the href itself?

One solution I am not thinking of is to look for the bounding box within which the link is found first and then look for the link within the new smaller search area but I still don't know why I am unable to find the right link from the entire page.

I hope this makes sense. Thank you!

CodePudding user response：

Try this:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time


chrome_options = Options()
#chrome_options.add_argument("--headless")
#chrome_options.add_argument("user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36")


driver = webdriver.Chrome(executable_path="./chromedriver", options=chrome_options)

driver.get("https://species-registry.canada.ca/index-en.html#/documents?documentTypeId=18&sortBy=documentTypeSort&sortDirection=asc&pageSize=10&keywords=Victoria's Owl-clover")
time.sleep(2)

driver.find_element_by_xpath("//a[@class='card-header']").click()

CodePudding user response：

A couple of things here:

COSEWIC-Assessment-and-status-report isn't the exact text, but it is COSEWIC Assessment and Status Report on the Victoria’s Owl-clover

The text is not within the A tag but within a SPAN:

<span data-v-7ee3c58f="" >COSEWIC Assessment and Status Report on the Victoria’s Owl-clover <em>Castilleja victoriae</em> in Canada</span>

So to identify the clickable element you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following Locator Strategies:

Using XPATH:

driver.get("https://species-registry.canada.ca/index-en.html#/documents?documentTypeId=18&sortBy=documentTypeSort&sortDirection=asc&pageSize=10&keywords=Victoria's Owl-clover")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH,"//span[contains(., 'COSEWIC Assessment and Status Report on the Victoria’s Owl-clover')]"))).click()

Note: You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC