I am trying to Scrape Multiple author names using Scrapy in python but Cause of the inner div and change of css_class for each author I am getting errors,
I am getting this Error, AttributeError: 'SelectorList' object has no attribute 'response'
class NewsSpider(scrapy.Spider):
name = "travelandleisure"
def start_requests(self):
url = input("Enter the article url: ")
yield scrapy.Request(url, callback=self.parse_dir_contents)
def parse_dir_contents(self, response):
try:
Authoro = response.css('div.comp mntl-bylines__group--author mntl-bylines__group mntl-block')
Author = []
for item in Authoro.response.css('div.comp mntl-bylines__item mntl-attribution__item::text'):
Authoro.append(item)
for item in Authoro.response.css('div.comp mntl-bylines__item mntl-attribution__item mntl-attribution__item--has-date::text'):
Authoro.append(item)
except IndexError:
Author = "NULL"
yield{
'Category':Category,
'Headlines':Headlines,
'Author': Author,
}
Here is link of site
to see HTML code
of the authors
,
https://www.travelandleisure.com/travel-news/where-can-americans-travel-right-now-a-country-by-country-guide
CodePudding user response:
This is one way to get the authors from that page:
[...]
def parse(self, response):
title = response.xpath('//h1[@id="article-heading_1-0"]/text()').get()
authors = ', '.join(set([x.strip() for x in response.xpath('//a[@]/text()').extract()]))
##[... other stuff from page]
yield {
'title': title.strip(),
'authors': authors,
## [..]
}
Scrapy documentation: https://docs.scrapy.org/en/latest/