I'm trying to scrape a specific web page and although on the console I get all results, on the outputted csv I don't. In this case, I want both title and author of a specific search, but I only get the title. If I reverse the order of the two I get author, so it only takes the first one. Why?
import scrapy
QUERY = "q=brilliant friend&qt=results_page#x0%3Abook-,%28x0%3Abook+x4%3Aprintbook%29,%28x0%3Abook+x4%3Adigital%29,%28x0%3Abook+x4%3Alargeprint%29,%28x0%3Abook+x4%3Amss%29,%28x0%3Abook+x4%3Athsis%29,%28x0%3Abook+x4%3Abraille%29,%28x0%3Abook+x4%3Amic%29,x0%3Aartchap-,%28x0%3Aartchap+x4%3Achptr%29,%28x0%3Aartchap+x4%3Adigital%29format"
class Spider(scrapy.Spider):
name = 'worldcatspider'
start_urls = ['https://www.worldcat.org/search?start=%s&%s' % (number, QUERY) for number in range(0, 4400, 10)]
def parse(self, response):
for title in response.css('.name a > strong ::text').extract():
yield {"title:": title}
for author in response.css('.author ::text').extract():
yield {"author:": author}
CodePudding user response:
My suggestion will be put for statement their head class or div.
I haven't checked but this should work:
def parse(self, response):
for page in response.css('.menuElem'):
title = page.css('.name a > strong ::text').extract()
author = page.css('.author ::text').extract()
yield {"title": title,
"author:": author}