I am trying to scrape Feature Image using scrapy in python but it's giving 'None' in fact I have attempted 3 to 4 methods to Scrape, but they are not working. can anyone please help me why any of my codes not giving the source link of the image, thanks in advance.
Here is the Code.
class NewsSpider(scrapy.Spider):
name = "cruisefever"
def start_requests(self):
url = input("Enter the article url: ")
yield scrapy.Request(url, callback=self.parse_dir_contents)
def parse_dir_contents(self, response):
Feature_Image = response.xpath('//*[@id="td_uid_2_634abd2257025"]/div/div[1]/div/div[8]/div/p[1]/img/@data-src').extract()[0]
#Feature_Image = response.xpath('//*[@id="td_uid_2_634abd2257025"]/div/div[1]/div/div[8]/div/p[1]/img/@data-img-url').extract()[0]
#Feature_Image = response.xpath('//*[@id="td_uid_2_634abd2257025"]/div/div[1]/div/div[8]/div/p[1]/img/@src').extract()[0]
#Feature_Image = [i.strip() for i in response.css('img[class*="alignnone size-full wp-image-39164 entered lazyloaded"] ::attr(src)').getall()][0]
yield{
'Feature_Image': Feature_Image,
}
Here is the website link https://cruisefever.net/carnival-cruise-lines-oldest-ship-sailing-final-cruise/
CodePudding user response:
You can scrape the featured image using this xpath
,
class NewsSpider(scrapy.Spider):
name = "cruisefever"
def start_requests(self):
url = input("Enter the article url: ")
yield scrapy.Request(url, callback=self.parse_dir_contents)
def parse_dir_contents(self, response):
image_tag = response.xpath('//div[@id="tdb-autoload-article"]/div/div/article/div/div/div/div/div/div//img')[1]
Feature_Image = image_tag.attrib['src']
yield{
'Feature_Image': Feature_Image,
}