I apologize if this seems like an easy fix but I don't understand why it is not working. I am looking for and trying to click the link:
<a href="#/documents/2077"
From a starting point of that URL. I have tried a few things including the following: #1
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT,"COSEWIC-Assessment-and-status-report")))
and
appraisal_html = driver.find_element_by_partial_link_text("COSEWIC-Assessment-and-status-report")
And #2
soup = bs(req.text,'html.parser')
for link in soup.find_all('a'):`
print(link.get('href'))`
Among other things. Keeping in mind that this is a generalized search in the sense that the species name will change every time I make this search, everything else should remain similar.
The second attempt is straight from the beautiful soup documentation and finds a whole bunch of links like the ones under the menu tab etc but not the href I am looking for.
The first attempt for some reason just times out without finding the partial text I input. Maybe this is because that is the text on the page and not the href itself?
One solution I am not thinking of is to look for the bounding box within which the link is found first and then look for the link within the new smaller search area but I still don't know why I am unable to find the right link from the entire page.
I hope this makes sense. Thank you!
CodePudding user response:
Try this:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
chrome_options = Options()
#chrome_options.add_argument("--headless")
#chrome_options.add_argument("user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36")
driver = webdriver.Chrome(executable_path="./chromedriver", options=chrome_options)
driver.get("https://species-registry.canada.ca/index-en.html#/documents?documentTypeId=18&sortBy=documentTypeSort&sortDirection=asc&pageSize=10&keywords=Victoria's Owl-clover")
time.sleep(2)
driver.find_element_by_xpath("//a[@class='card-header']").click()
CodePudding user response:
A couple of things here:
COSEWIC-Assessment-and-status-report isn't the exact text, but it is
COSEWIC Assessment and Status Report on the Victoria’s Owl-clover
The text is not within the A tag but within a SPAN:
<span data-v-7ee3c58f="" >COSEWIC Assessment and Status Report on the Victoria’s Owl-clover <em>Castilleja victoriae</em> in Canada</span>
So to identify the clickable element you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following Locator Strategies:
Using XPATH:
driver.get("https://species-registry.canada.ca/index-en.html#/documents?documentTypeId=18&sortBy=documentTypeSort&sortDirection=asc&pageSize=10&keywords=Victoria's Owl-clover") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH,"//span[contains(., 'COSEWIC Assessment and Status Report on the Victoria’s Owl-clover')]"))).click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC