I´m trying to scrap the Amazon website with Scrapy. I can easily scrap items like title of product, or price, but I have no clue how to extract the url of a product (marked in picture at the bottom). Currently my def parse function looks like that:
def parse(self, response):
items = BigItem()
all_boxes = response.css('.s-widget-spacing-small > .sg-col-inner')
for boxes in all_boxes:
name = boxes.css('.s-link-style .a-text-normal').css('::text').extract()
author = boxes.css('.a-color-secondary .a-size-base:nth-child(2)').css('::text').extract()
price = boxes.css('.s-price-instructions-style .a-price-whole').css('::text').extract()
imagelink = boxes.css('.s-image::attr(src)').extract()
rating = boxes.css('.a-spacing-top-small .aok-align-bottom').css('::text').extract()
valuation = boxes.css('.a-spacing-top-small .s-link-style .s-underline-text').css('::text').extract()
link = boxes.css('a-link-normal s-underline-text s-underline-link-text s-link-style a-text-normal::attr(href)').extract()
items['name'] = name
items['author'] = author
items['price'] = price
items['imagelink'] = imagelink
items['rating'] = rating
items['valuation'] = valuation
items['link'] = link
yield items
I also tried to extract as ::text & with outer .css(::text) or .css(::href) but it´s not working.
[enter image description here][1] [1]: https://i.stack.imgur.com/f1doP.png
CodePudding user response:
Use periods in front of class names .a-link-normal
boxes.css(".a-link-normal .s-underline-text .s-underline-link-text .s-link-style .a-text-normal::attr(href)").extract():
CodePudding user response:
Use .extract_first()
or .get()
method
link = boxes.css('.a-link-normal s-underline-text s-underline-link-text s-link-style a-text-normal::attr(href)').get()
items['link'] = 'https://www.amazon.de/' link