Home > Software engineering >  Scrapy doesn't save all results to csv
Scrapy doesn't save all results to csv

Time:03-11

I'm trying to scrape a specific web page and although on the console I get all results, on the outputted csv I don't. In this case, I want both title and author of a specific search, but I only get the title. If I reverse the order of the two I get author, so it only takes the first one. Why?

import scrapy

QUERY = "q=brilliant friend&qt=results_page#x0%3Abook-,%28x0%3Abook+x4%3Aprintbook%29,%28x0%3Abook+x4%3Adigital%29,%28x0%3Abook+x4%3Alargeprint%29,%28x0%3Abook+x4%3Amss%29,%28x0%3Abook+x4%3Athsis%29,%28x0%3Abook+x4%3Abraille%29,%28x0%3Abook+x4%3Amic%29,x0%3Aartchap-,%28x0%3Aartchap+x4%3Achptr%29,%28x0%3Aartchap+x4%3Adigital%29format"

class Spider(scrapy.Spider):
    name = 'worldcatspider'
    start_urls = ['https://www.worldcat.org/search?start=%s&%s' % (number, QUERY) for number in range(0, 4400, 10)]

    def parse(self, response):
        for title in response.css('.name a > strong ::text').extract():
            yield {"title:": title}
        for author in response.css('.author ::text').extract():
            yield {"author:": author}

CodePudding user response:

My suggestion will be put for statement their head class or div.

I haven't checked but this should work:

def parse(self, response):
 for page in response.css('.menuElem'): 
    title = page.css('.name a > strong ::text').extract()
    author = page.css('.author ::text').extract()
    yield {"title": title,
           "author:": author}
  • Related