Any help is appreciated.
Need help with Selenium trying to scrape list of cars from carmax site. url = 'https://www.carmax.com/cars?includenontransferables=False&year=2018-2023&mileage=30000&price=18000-30000'
Outside of selenium, I am able submit URL (via Chrome on mac) and then click on "SEE MORE MATCHES" multiple times. It add 22 car tiles each time. I want to get the full 228 cars that match the filter.
When I use selenium, I get the initial page with list of 22 tiles (cars). But when I click manually on "SEE MORE MATCHES" (inside Selenium browser) I get the "We're Sorry, an error occured"
So on the selenium browser window I manually pasted the URL and I got a message:
Access Denied
You don't have permission to access "http://www.carmax.com/cars?" on this server.
Reference #18.61f1eb8.1664947333.87596fdb
Below is the code I am trying to run to loop thru all the pages to see all 228 car tiles.
# The following works and I see a list of cars
# browser = chromedriver()
# browser.get('https://www.carmax.com/cars?includenontransferables=False&year=2018-2023&mileage=30000&price=18000-30000')
# following works because the "SEE MORE MATCHES" @ bottom is display in browser
e = browser.find_element(By.ID, "see-more")
eBut = e.find_element(By.XPATH, ".//a")
print(eBut.text)
# The following works because button lights up in blue
hover = ActionChains(browser).move_to_element(eBut)
hover.perform()
# following causes an error "We're sorry, An error occurred in your search."
eBut.click()
time.sleep(3)
CodePudding user response:
You should remove with Options all the hints which indicate that you are an automated bot. They are simply freezing your session when JS verifies these flags. When initializing your bot use the following code and you will be fine,
options = Options()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36")
driver = selenium.webdriver.Chrome(driver_path, options = options)
The complete code would be:
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
import selenium
import time
import bs4
# Spawn WebDriver:
options = Options()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36")
driver = selenium.webdriver.Chrome("chromedriver.exe", options = options)
# Go-To page:
driver.get("https://www.carmax.com/cars?includenontransferables=false&year=2018-2023&mileage=30000&price=18000-30000")
wait = WebDriverWait(driver, 600)
# Click on See More:
ef = wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="see-more"]/div/a')))
time.sleep(2)
ef.click()
# Get the Page with Bs4:
soup = bs4.BeautifulSoup(driver.page_source, "lxml")
# Repeat the process...
CodePudding user response:
I use another auto lib to solve this problem, it can automate the user browser, not as selenium web driver way.
from time import sleep
from clicknium import clicknium as cc
if not cc.chrome.extension.is_installed():
cc.chrome.extension.install_or_update()
tab = cc.chrome.open("https://www.carmax.com/cars?includenontransferables=false&year=2018-2023")
tab.wait_appear_by_xpath('//*[@id="see-more"]/div/div/span[1]', wait_timeout=5)
while tab.is_existing_by_xpath('//*[@id="see-more"]/div/a'):
tab.find_element_by_xpath('//*[@id="see-more"]/div/a').click()
sleep(3)
CodePudding user response:
Answer of setting the Options to remove all hints of a bot worked perferctly (for now :-)