I'm using debian Bullseye (11.2) I want to save to a (.csv) file. How can I do this?
from scrapy.spiders import CSVFeedSpider
class CsSpiderSpider(CSVFeedSpider):
name = 'cs_spider'
allowed_domains = ['ocw.mit.edu/courses/electrical-engineering-and-computer-science/']
start_urls = ['http://ocw.mit.edu/courses/electrical-engineering-and-computer-science//feed.csv']
# headers = ['id', 'name', 'description', 'image_link']
# delimiter = '\t'
# Do any adaptations you need here
#def adapt_response(self, response):
# return response
def parse_row(self, response, row):
i = {}
#i['url'] = row['url']
#i['name'] = row['name']
#i['description'] = row['description']
return i
CodePudding user response:
One of the default libraries included with every python installation is csv
You could use csv.writer() to create and write a csv file without issues.
CodePudding user response:
Here's an example of using the FEEDS
export from scrapy.
import scrapy
from scrapy.crawler import CrawlerProcess
class CsspiderSpider(scrapy.Spider):
name = 'cs_spider'
start_urls = ['http://ocw.mit.edu/courses/electrical-engineering-and-computer-science']
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url=url, callback = self.parse_row
)
def parse_row(self, response):
yield {
'test':response.text
}
process = CrawlerProcess(
settings = {
'FEEDS':{
'data.csv':{
'format':'csv'
}
}
}
)
process.crawl(CsspiderSpider)
process.start()
Will save the output of your file into .csv
format. Furthermore, To specify columns to export and their order use FEED_EXPORT_FIELDS
. You can read more about this in the docs
In the command line you can run:
scrapy crawl cs_spider -o output.csv
However, when running the above in the command line make sure to comment out all the code from process
and below.