Home > Blockchain >  Using Scrapy and Scrapy shell in python to scrape the feature image from this website but it returns
Using Scrapy and Scrapy shell in python to scrape the feature image from this website but it returns

Time:10-20

Using Scrapy and Scrapy shell in python to scrape the feature image from this website https://www.thrillist.com/travel/nation/all-the-ways-to-cool-off-in-austin but it returns this data:image/gif;base64,R0 instead of src of the image, I need the help of someone if any one tell me the way to fix this to get src of the image

Here is my Code

Feature_Image = [i.strip() for i in response.xpath('//*[@id="main-content"]/article/div/div/div[2]/div[1]/picture/img/@src').getall()][0]

CodePudding user response:

It looks like the tag has a data-src attribute that holds the link and some image attributes. Parsing the text and extracting the first section get's you the link.

>>> link = response.xpath("//div[@data-element-type='ParagraphMainImage']//img/@data-src").get().split(";")[0]
>>> link
'https://assets3.thrillist.com/v1/image/3086882/414x310/crop'

You can add manually add .jpg to the end if you want to be able to differentiate what type of image it is. The link works with and without the extension.

CodePudding user response:

The biggest image on that page would be the one marked (somehow) for Desktop - common sense logic. So why not try to locate its source like below?

pic = response.xpath('//picture[@data-testid="picture-tag"]//source[@data-size="desktop"]/@srcset').get()

Result is the source for the biggest size for that page poster:

https://assets3.thrillist.com/v1/image/3086882/1584x1056/crop;webp=auto;jpeg_quality=60;progressive.jpg
  • Related