Home > Software design >  How to pull the href information from specific class using Selenium and Python
How to pull the href information from specific class using Selenium and Python

Time:04-04

I'm currently working on some web scraping using python and selenium, and I can't seem to pull the link information from a href in an anchor tag for a specific class. for reference, its from zillow (specifically, this url : enter image description here

I've tried a few different options in order to select the anchor tag listed but can't seem to return the information i need :

links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))
 -- returns 
None

also tried

links = driver.find_elements(By.CLASS_NAME, "list-card-top")
print(links[0].get_attribute('href'))
 -- returns 
None

links = driver.find_elements(By.CLASS_NAME, "list-card-link list-card-link-top-margin")
print(links[0].get_attribute('href'))
 -- returns 
None

and lastly

links = driver.find_elements(By.CSS_SELECTOR, "list-card-info.a")
print(links[0].get_attribute('href'))

I know I can pull all the anchor tags, but certainly there is a step im missing here to get the nested anchor tag value? or am i pulling the wrong class? not sure where im going wrong?

CodePudding user response:

You could use XPATH to find the link (a tag) and use get_attribute('href') to get the link from the tag.

Like this:

href = driver.find_element(By.XPATH, '//div[@]/a').get_attribute('href')
print(href)

Another example:

href = driver.find_element(By.XPATH, '//div[@]/a').get_attribute('href')
print(href)

If you want to use By.CLASS_NAME, you could do it like this:

link = driver.find_element(By.CLASS_NAME, "list-card-top")
a = link.find_element(By.TAG_NAME, 'a').get_attribute('href')
print(href)

In your case:

links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))

You're trying to find an attribute named 'href' in that div element with class list-card-info. We actually want to get the 'href' from the a tag inside that div.

CodePudding user response:

To print the value of the href attribute you have to induce WebDriverWait for the visibility_of_all_elements_located() and using list slicing you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    driver.get('https://www.zillow.com/san-francisco-ca/rentals/?searchQueryState={"pagination":{},"usersSearchTerm":"San Francisco, CA","mapBounds":{"west":-122.62421695117187,"east":-122.24244204882812,"south":37.70334422496088,"north":37.84716973355808},"regionSelection":[{"regionId":20330,"regionType":6}],"isMapVisible":true,"filterState":{"fsba":{"value":false},"nc":{"value":false},"fore":{"value":false},"cmsn":{"value":false},"fr":{"value":true},"ah":{"value":true}},"isListVisible":true,"mapZoom":11}')
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[class='list-card-top'] > a[href]")))])
    
  • Using XPATH in a single line:

    driver.get('https://www.zillow.com/san-francisco-ca/rentals/?searchQueryState={"pagination":{},"usersSearchTerm":"San Francisco, CA","mapBounds":{"west":-122.62421695117187,"east":-122.24244204882812,"south":37.70334422496088,"north":37.84716973355808},"regionSelection":[{"regionId":20330,"regionType":6}],"isMapVisible":true,"filterState":{"fsba":{"value":false},"nc":{"value":false},"fore":{"value":false},"cmsn":{"value":false},"fr":{"value":true},"ah":{"value":true}},"isListVisible":true,"mapZoom":11}')
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='list-card-top']/a[@href]")))])
    
  • Console Output:

    ['https://www.zillow.com/homedetails/San-Francisco-CA-94134/15166498_zpid/', 'https://www.zillow.com/b/avery-450-san-francisco-ca-BTfktx/', 'https://www.zillow.com/b/solaire-san-francisco-ca-65g7KK/', 'https://www.zillow.com/homedetails/117-Saint-Charles-Ave-San-Francisco-CA-94132/15195262_zpid/', 'https://www.zillow.com/homedetails/433-40th-Ave-San-Francisco-CA-94121/15092586_zpid/', 'https://www.zillow.com/homedetails/123-Carl-St-San-Francisco-CA-94117/2078490576_zpid/', 'https://www.zillow.com/b/fifteen-fifty-san-francisco-ca-BdnYPc/', 'https://www.zillow.com/b/l-seven-san-francisco-ca-9NJtD7/', 'https://www.zillow.com/homedetails/4642-18th-St-San-Francisco-CA-94114/332858409_zpid/']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Related