Home > other >  getting error in selenium while scraping amazon data
getting error in selenium while scraping amazon data

Time:10-10

I created a program to scrap data from amazon but getting errors which i am unable to understand. I am using Xpath to locate classes and i tried to extract books names on a amazon page. I am searching amazon with a keyword hacking books and it successfully searches it but it does not give result after searching it. I tried following code

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time as t
import pandas as pd

driver = webdriver.Chrome(executable_path='chromedriver.exe')

wait = WebDriverWait(driver, 5)
url = "https://www.amazon.com"
driver.get(url)

keyword = "hacking books"
search_book = driver.find_element(By.ID,'twotabsearchtextbox')
search_book.send_keys(keyword)
search_button = driver.find_element(By.ID,'nav-search-submit-button')
search_button.click()

big_list = []

while True:
    try:
        items = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//a[@class =alink-normal s-underline-text s-underline-link-text s-link-style a-text-normal]')))
        for i in items:
            big_list.append((i.text, i.get_attribute('href')))      
        next_page_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//span[@class=s-pagination-strip]//a[contains(text(), "Next")]')))        
        next_page_button.location_once_scrolled_into_view
        t.sleep(10)
        next_page_button.click()
        print('clicked, going to next page')
        t.sleep(10)
    except TimeoutException:
        print('all pages done')
        break
df = pd.DataFrame(big_list, columns = ['Book', 'Url'])
print(df)
df.to_csv('hacking_books.csv')
driver.quit()

Can you help to find bug.

CodePudding user response:

Problem is simple you are not using double quotes in class names try

//a[@class ="alink-normal s-underline-text s-underline-link-text s-link-style a-text-normal"]'

And same for other XPath use double quotes like

CodePudding user response:

your Xpath is not a valid expression A valid xpath expression is like

Relative Path:'//tagname[@attribute=""]'

So you just have to use double quotes

  • Related