Home > Back-end >  save scraped data in different dictionaries using filter condition
save scraped data in different dictionaries using filter condition

Time:06-21

I have scraped 2 urls from a same spider as follows:

 def start_requests(self):
  #calling Dawn Categories
  yield Request('https://www.dawn.com/business',callback=self.Dawn, meta={'category': 'business','source': 'DAWN'})
  yield Request('https://www.dawn.com/sport',callback=self.Dawn, meta={'category': 'sports','source': 'DAWN'})

where self.Dawn scrapes the news from the links as follows:

def parseDawn(self, response):
  items = WebscrapingItem()

  title = response.css("h2.story__title a.story__link::text").extract_first().strip() ,
  author = response.css("span.story__byline a.story__byline__link::text").extract_first() , 
  category = response.meta['category']

  items['title'] = title
  items['author'] = author
  items['category'] = category

  yield items

Now, in my pipelines.py file, I want to filter those scraped news that have category=='business' and category=='sports' in two separate dictionaries. I am doing this so that the filtered news can be saved separately in my database. Is there a way of doing this???

CodePudding user response:

You can easily do that using your pipeline -


class BotPipeline:
    def process_item(self, item, spider):
        if item['category'] == 'business':
            # insert db operation with this filtered item
            return item
        if item['category'] == 'sports':
            # insert db operation with this filtered item
            return item
  • Related