The code below sometimes returns one (1) element, sometimes all, and sometimes none. For it to work for my application I need it to return all the matching elements in the page
Code trials:
from selenium import webdriver
from selenium.webdriver.common.by import By
def villanovan():
driver = webdriver.Chrome()
driver.implicitly_wait(10)
url = 'http://webcache.googleusercontent.com/search?q=cache:https://villanovan.com/&strip=0&vwsrc=0'
url_2 = 'https://villanovan.com/'
driver.get(url_2)
a = driver.find_elements(By.CLASS_NAME, "homeheadline")
titles = [i.text for i in a if len(i.text) != 0]
links = [i.get_attribute('href') for i in a if len(i.text) != 0]
return [titles, links]
if __name__ == "__main__":
print(villanovan())
I was expecting a list with multiple links and article titles, but recieved a list with the first element found, not all elements found.
CodePudding user response:
To extract the value of href
attributes you can use list comprehension and you can use either of the following locator strategies:
Using CSS_SELECTOR:
driver.get("https://villanovan.com/") time.sleep(3) print([my_elem.get_attribute("href") for my_elem in driver.find_elements(By.CSS_SELECTOR, "a.homeheadline[href]")])
Using XPATH:
driver.get("https://villanovan.com/") time.sleep(3) print([my_elem.get_attribute("href") for my_elem in driver.find_elements(By.XPATH, "//a[@class='homeheadline' and @href]")])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
Console Output:
['https://villanovan.com/22105/sports/villanova-goes-cold-in-clutch-against-no-14-marquette/', 'https://villanovan.com/22102/sports/villanova-bests-marquette-in-blowout-win-73-54/', 'https://villanovan.com/22098/news/decarbonizing-villanova-a-town-hall-on-fossil-fuel-divestment/', 'https://villanovan.com/22096/news/biology-professors-granted-1-million-for-wetlands-research/', 'https://villanovan.com/22093/news/students-create-the-space-supporting-sex-education/', 'https://villanovan.com/22098/news/decarbonizing-villanova-a-town-hall-on-fossil-fuel-divestment/', 'https://villanovan.com/22096/news/biology-professors-granted-1-million-for-wetlands-research/', 'https://villanovan.com/22044/culture/julia-staniscis-leaning-on-letters/', 'https://villanovan.com/22032/culture/villanova-sorority-recruitment-recap/', 'https://villanovan.com/22105/sports/villanova-goes-cold-in-clutch-against-no-14-marquette/', 'https://villanovan.com/22102/sports/villanova-bests-marquette-in-blowout-win-73-54/', 'https://villanovan.com/21932/opinion/villanova-should-be-free-for-families-earning-less-than-100000/', 'https://villanovan.com/21897/opinion/grasshoppergate-the-state-of-villanova-dining/', 'https://villanovan.com/22105/sports/villanova-goes-cold-in-clutch-against-no-14-marquette/', 'https://villanovan.com/22102/sports/villanova-bests-marquette-in-blowout-win-73-54/', 'https://villanovan.com/22093/news/students-create-the-space-supporting-sex-education/', 'https://villanovan.com/22090/news/mlk-day-of-service/', 'https://villanovan.com/22087/news/university-updates-covid-procedures/']