I am experiencing some troubles with my code when scraping the Amazon site with selenium. I want a list of dictionaries of title and author of the books as keys and values, in the format:
[{TITLE:'x', AUTHOR:'y'}
{TITLE:'z', AUTHOR:'w'}]
However it returns me a dictionary of lists, with keys and values repeated, in the format:
{TITLE:['x'], AUTHOR:['y']}
{TITLE:['x', 'z'], AUTHOR:['y', 'r']}
{TITLE:['x', 'z', 'q'], AUTHOR:['y', 'r', 'p']}
That is: it iterates and repeat the values for each key. It shows me the previous value, and includes it in the next dictionary. It is not supposed to happen. What am I doing wrong?
Here is my code:
Firstly, I import the libraries:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from time import sleep
Secondly, I install the proper version o chromedrive:
Service(ChromeDriverManager().install())
Thirdly, I open the browser automaticaly:
options = Options()
options.add_argument('window-size=250,1000')
driver = webdriver.Chrome(executable_path=r'C:\Users\dambr\Documents\scrapping\chromedriver.exe', options=options)
driver.implicitly_wait(5)
Fourtly, I open the Amazon site:
driver.get('https://www.amazon.com.br/')
a = driver.find_element(By.ID, "twotabsearchtextbox")
a.click()
a.send_keys('python')
b = driver.find_element(By.ID, "nav-search-submit-button")
b.click()
sleep(3)
Finally, I take all the titles and authors of my search and try to store in a list of dictionaries:
dic_livros = {'TÍTULO':[], 'AUTOR':[]}
lista = '//*[@id="search"]/div[1]/div[1]/div/span[1]'
for i in lista:
title = driver.find_elements(By.XPATH, "//span[@class='a-size-base-plus a-color-base a-text-normal']")
author = driver.find_elements(By.XPATH, "//span[@class='a-size-base']")
for (each_title, each_author) in zip(title, author):
dic_livros['TÍTULO'].append(each_title.text)
dic_livros['AUTOR'].append(each_author.text)
print(dic_livros)
Where, precisely is my mistake?
Here is what my output looks like:
CodePudding user response:
your last step needs two changes: replace first line with
dic_livros = []
then for the for loop:
for (each_title, each_author) in zip(title, author):
dic_livros.append({'Titulo':each_title.text,'Autor':each_author.text})