Home > Enterprise >  I am trying to scrape author name using scrapy in python but its giving result None or sometime get
I am trying to scrape author name using scrapy in python but its giving result None or sometime get

Time:10-14

I am trying to scrape author name using scrapy in python but its giving result None or sometime get "\t\t\t\t\t\t\n\n\n\n\n\t" instead of author name. I tried many ways like response.css response.xpath etc. It was doing the same problem with the article Headline too when I copied the XPath from inspecting but then I tried SelectorGadget to copy the XPath it work for the Headline but for the author SelectorGadget Xpath also not working for me.

Here is my Code

class NewsSpider(scrapy.Spider):
    name = "cruiseradio"

    def start_requests(self):
        url = input("Enter the article url: ")
        
        yield scrapy.Request(url, callback=self.parse_dir_contents)

    def parse_dir_contents(self, response):
        try:
            Author = response.css('span.elementor-post-info__item--type-author::text').get()
        except IndexError:
            Author = "NULL"
        yield{
            'Author': Author,
        }

Here is URL of the site. https://cruiseradio.net/new-expedition-ship-delivered-atlas-ocean-voyages/

CodePudding user response:

Check this out

[i.strip() for i in response.css('[itemprop="author"] span[class*="item--type-author"] ::text').getall() if i.strip() and i.lower().strip() != "by"][0]

if there is only 1 author per post, if there can be multiple, remove [0] part from the end.

  • Related