Home > database >  I am trying to Scrape Multiple author name using Scrapy in python but Cuase of inner div and change
I am trying to Scrape Multiple author name using Scrapy in python but Cuase of inner div and change

Time:10-12

I am trying to Scrape Multiple author names using Scrapy in python but Cause of the inner div and change of css_class for each author I am getting errors, I am getting this Error, AttributeError: 'SelectorList' object has no attribute 'response'

class NewsSpider(scrapy.Spider):
    name = "travelandleisure"

    def start_requests(self):
        url = input("Enter the article url: ")
        
        yield scrapy.Request(url, callback=self.parse_dir_contents)

    def parse_dir_contents(self, response):
        try:
            Authoro = response.css('div.comp mntl-bylines__group--author mntl-bylines__group mntl-block')
            Author = []
            for item in Authoro.response.css('div.comp mntl-bylines__item mntl-attribution__item::text'):
                Authoro.append(item)
            for item in Authoro.response.css('div.comp mntl-bylines__item mntl-attribution__item mntl-attribution__item--has-date::text'):
                Authoro.append(item)
        except IndexError:
            Author = "NULL"
        yield{
            'Category':Category,
            'Headlines':Headlines,
            'Author': Author,
        }

Here is link of site to see HTML code of the authors, https://www.travelandleisure.com/travel-news/where-can-americans-travel-right-now-a-country-by-country-guide

CodePudding user response:

This is one way to get the authors from that page:

[...]
def parse(self, response):
        title = response.xpath('//h1[@id="article-heading_1-0"]/text()').get()
        authors = ', '.join(set([x.strip() for x in response.xpath('//a[@]/text()').extract()]))
        ##[... other stuff from page]
        yield {
            'title': title.strip(),
            'authors': authors,
            ## [..]
        }

Scrapy documentation: https://docs.scrapy.org/en/latest/

  • Related