I'm currently working on some web scraping using python and selenium, and I can't seem to pull the link information from a href in an anchor tag for a specific class. for reference, its from zillow (specifically, this url :
I've tried a few different options in order to select the anchor tag listed but can't seem to return the information i need :
links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))
-- returns
None
also tried
links = driver.find_elements(By.CLASS_NAME, "list-card-top")
print(links[0].get_attribute('href'))
-- returns
None
links = driver.find_elements(By.CLASS_NAME, "list-card-link list-card-link-top-margin")
print(links[0].get_attribute('href'))
-- returns
None
and lastly
links = driver.find_elements(By.CSS_SELECTOR, "list-card-info.a")
print(links[0].get_attribute('href'))
I know I can pull all the anchor tags, but certainly there is a step im missing here to get the nested anchor tag value? or am i pulling the wrong class? not sure where im going wrong?
CodePudding user response:
You could use XPATH to find the link (a tag) and use get_attribute('href')
to get the link from the tag.
Like this:
href = driver.find_element(By.XPATH, '//div[@]/a').get_attribute('href')
print(href)
Another example:
href = driver.find_element(By.XPATH, '//div[@]/a').get_attribute('href')
print(href)
If you want to use By.CLASS_NAME
, you could do it like this:
link = driver.find_element(By.CLASS_NAME, "list-card-top")
a = link.find_element(By.TAG_NAME, 'a').get_attribute('href')
print(href)
In your case:
links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))
You're trying to find an attribute named 'href' in that div element with class list-card-info. We actually want to get the 'href' from the a tag inside that div.
CodePudding user response:
To print the value of the href attribute you have to induce WebDriverWait for the visibility_of_all_elements_located() and using list slicing you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
driver.get('https://www.zillow.com/san-francisco-ca/rentals/?searchQueryState={"pagination":{},"usersSearchTerm":"San Francisco, CA","mapBounds":{"west":-122.62421695117187,"east":-122.24244204882812,"south":37.70334422496088,"north":37.84716973355808},"regionSelection":[{"regionId":20330,"regionType":6}],"isMapVisible":true,"filterState":{"fsba":{"value":false},"nc":{"value":false},"fore":{"value":false},"cmsn":{"value":false},"fr":{"value":true},"ah":{"value":true}},"isListVisible":true,"mapZoom":11}') print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[class='list-card-top'] > a[href]")))])
Using XPATH in a single line:
driver.get('https://www.zillow.com/san-francisco-ca/rentals/?searchQueryState={"pagination":{},"usersSearchTerm":"San Francisco, CA","mapBounds":{"west":-122.62421695117187,"east":-122.24244204882812,"south":37.70334422496088,"north":37.84716973355808},"regionSelection":[{"regionId":20330,"regionType":6}],"isMapVisible":true,"filterState":{"fsba":{"value":false},"nc":{"value":false},"fore":{"value":false},"cmsn":{"value":false},"fr":{"value":true},"ah":{"value":true}},"isListVisible":true,"mapZoom":11}') print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='list-card-top']/a[@href]")))])
Console Output:
['https://www.zillow.com/homedetails/San-Francisco-CA-94134/15166498_zpid/', 'https://www.zillow.com/b/avery-450-san-francisco-ca-BTfktx/', 'https://www.zillow.com/b/solaire-san-francisco-ca-65g7KK/', 'https://www.zillow.com/homedetails/117-Saint-Charles-Ave-San-Francisco-CA-94132/15195262_zpid/', 'https://www.zillow.com/homedetails/433-40th-Ave-San-Francisco-CA-94121/15092586_zpid/', 'https://www.zillow.com/homedetails/123-Carl-St-San-Francisco-CA-94117/2078490576_zpid/', 'https://www.zillow.com/b/fifteen-fifty-san-francisco-ca-BdnYPc/', 'https://www.zillow.com/b/l-seven-san-francisco-ca-9NJtD7/', 'https://www.zillow.com/homedetails/4642-18th-St-San-Francisco-CA-94114/332858409_zpid/']
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC