I am trying to scrape author name using scrapy in python but its giving result None or sometime get "\t\t\t\t\t\t\n\n\n\n\n\t" instead of author name. I tried many ways like response.css
response.xpath
etc. It was doing the same problem with the article Headline too when I copied the XPath from inspecting but then I tried SelectorGadget to copy the XPath it work for the Headline but for the author SelectorGadget Xpath also not working for me.
Here is my Code
class NewsSpider(scrapy.Spider):
name = "cruiseradio"
def start_requests(self):
url = input("Enter the article url: ")
yield scrapy.Request(url, callback=self.parse_dir_contents)
def parse_dir_contents(self, response):
try:
Author = response.css('span.elementor-post-info__item--type-author::text').get()
except IndexError:
Author = "NULL"
yield{
'Author': Author,
}
Here is URL of the site. https://cruiseradio.net/new-expedition-ship-delivered-atlas-ocean-voyages/
CodePudding user response:
Check this out
[i.strip() for i in response.css('[itemprop="author"] span[class*="item--type-author"] ::text').getall() if i.strip() and i.lower().strip() != "by"][0]
if there is only 1 author per post, if there can be multiple, remove [0] part from the end.