I am trying to append car links from a website into a list. I want to traverse that list to get information from each of the car's web pages.
So far I have tried both .append method as well as = operator method but I get the same errors for both which is :
AttributeError: 'str' object has no attribute 'get_attribute'
This only shows up when I use the following line of code:
carLinks = [carLink.get_attribute("href")]
or the append
method. However, if I just print the carLink.get_attribute("href")
then it prints all the links.
This is the partial code I used:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://www.tred.com/buy?body_style=&distance=50&exterior_color_id=&make=&miles_max=100000&miles_min=0&model=&page_size=24&price_max=100000&price_min=0&query=&requestingPage=buy&sort=desc&sort_field=updated&status=active&year_end=2022&year_start=1998&zip=")
carLinks = []
carLinks = driver.find_elements_by_css_selector("div.grid-box-container a")
for carLink in carLinks:
carLinkUrl = carLink.get_attribute("href")
carLinks.append(carLinkUrl)
# print(carLinkUrl)
print(carLinks)
driver.quit()
I haven't tried it in BeautifulSoup yet as I am not used to mixing both Selenium and BeautifulSoup at once.
CodePudding user response:
You have to add a wait / delay to let the page elements loaded before accessing them.
Without that getting driver.find_elements_by_css_selector("div.grid-box-container a")
immediately after driver.get("https://www.tred.com/buy?body_style=&distance=50&exterior_color_id=&make=&miles_max=100000&miles_min=0&model=&page_size=24&price_max=100000&price_min=0&query=&requestingPage=buy&sort=desc&sort_field=updated&status=active&year_end=2022&year_start=1998&zip=")
returns an empty list passed into the carLinks
.
This should fork better:
rom selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(ChromeDriverManager().install())
wait = WebDriverWait(driver, 20)
driver.get("https://www.tred.com/buy?body_style=&distance=50&exterior_color_id=&make=&miles_max=100000&miles_min=0&model=&page_size=24&price_max=100000&price_min=0&query=&requestingPage=buy&sort=desc&sort_field=updated&status=active&year_end=2022&year_start=1998&zip=")
carLinks = []
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.grid-box-container a")))
carLinks = driver.find_elements_by_css_selector("div.grid-box-container a")
for carLink in carLinks:
carLinkUrl = carLink.get_attribute("href")
carLinks.append(carLinkUrl)
# print(carLinkUrl)
print(carLinks)
driver.quit()
CodePudding user response:
This is cause you have a list of and name is carLinks
. Also in your loop :
for carLink in carLinks:
carLinkUrl = carLink.get_attribute("href")
carLinks.append(carLinkUrl)
You have same name of a web element.
Compiler will think carLinks
is an web element
because of local scope.
and since carLinks
is locally a web element
, there is no append
method available in Selenium.
Please change either one of names.
carLinks = []
links = driver.find_elements_by_css_selector("div.grid-box-container a")
for car_link in links:
carLinks.append(car_link.get_attribute('href'))
print(carLinks)
CodePudding user response:
I noticed your list 'carLinks' shares the same name as the driver.find. so first that name refers to a list, to which you can append. But before you do you change the variable to a web element (which I suppose is a string then) using selenium.
Could this be the issue? I'd suggest renaming that list.
Quick side note, check if the website allows webscraping. I recall a site called autoscout having some legal issues doing something similar.
CodePudding user response:
So I found this link where the guy used a range()
for loop rather than iterating the list of webpage links (objects). There's probably some local scope issue like cruisepandey said or maybe the delay is too short like Pandey said. It works fine now.
I changed the code to this:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://www.tred.com/buy?body_style=&distance=50&exterior_color_id=&make=&miles_max=100000&miles_min=0&model=&page_size=24&price_max=100000&price_min=0&query=&requestingPage=buy&sort=desc&sort_field=updated&status=active&year_end=2022&year_start=1998&zip=")
carLinks = []
carLinks = driver.find_elements_by_css_selector("div.grid-box-container a")
for i in range(len(carLinks)):
carLinks.append(carLinks[i].get_attribute('href'))
print(carLinks)
driver.quit()
Even removed the carLink
variable to make it shorter.