Home > OS >  Web scraping Amazon and create a list of dic
Web scraping Amazon and create a list of dic

Time:01-21

I am experiencing some troubles with my code when scraping the Amazon site with selenium. I want a list of dictionaries of title and author of the books as keys and values, in the format:

[{TITLE:'x', AUTHOR:'y'}
{TITLE:'z', AUTHOR:'w'}]

However it returns me a dictionary of lists, with keys and values repeated, in the format:

{TITLE:['x'], AUTHOR:['y']}
{TITLE:['x', 'z'], AUTHOR:['y', 'r']}
{TITLE:['x', 'z', 'q'], AUTHOR:['y', 'r', 'p']}

That is: it iterates and repeat the values for each key. It shows me the previous value, and includes it in the next dictionary. It is not supposed to happen. What am I doing wrong?

Here is my code:

Firstly, I import the libraries:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from time import sleep

Secondly, I install the proper version o chromedrive:

Service(ChromeDriverManager().install())

Thirdly, I open the browser automaticaly:

options = Options()
options.add_argument('window-size=250,1000')
driver = webdriver.Chrome(executable_path=r'C:\Users\dambr\Documents\scrapping\chromedriver.exe', options=options)
driver.implicitly_wait(5)

Fourtly, I open the Amazon site:

driver.get('https://www.amazon.com.br/')
a = driver.find_element(By.ID, "twotabsearchtextbox")
a.click()
a.send_keys('python')
b = driver.find_element(By.ID, "nav-search-submit-button")
b.click()
sleep(3)

Finally, I take all the titles and authors of my search and try to store in a list of dictionaries:

dic_livros = {'TÍTULO':[], 'AUTOR':[]}
lista = '//*[@id="search"]/div[1]/div[1]/div/span[1]'
for i in lista:
    title = driver.find_elements(By.XPATH, "//span[@class='a-size-base-plus a-color-base a-text-normal']")
    author = driver.find_elements(By.XPATH, "//span[@class='a-size-base']")    
    for (each_title, each_author) in zip(title, author):
        dic_livros['TÍTULO'].append(each_title.text)
        dic_livros['AUTOR'].append(each_author.text)
        print(dic_livros)

Where, precisely is my mistake?

Here is what my output looks like: enter image description here

CodePudding user response:

your last step needs two changes: replace first line with

dic_livros = []

then for the for loop:

for (each_title, each_author) in zip(title, author):
    dic_livros.append({'Titulo':each_title.text,'Autor':each_author.text})
  • Related