instead of day ago, minutes ago or hours ago I want to scrape the date published in today so date sh-CodePudding

Instead of a day ago, minutes ago, or hours ago I want to scrape the date if published today so the date should be of today or the date of published day in scrapy using python.

This the code I try.

Published_Date = response.css('time::text').get().replace(",","").replace("Published ","")#Published Jul 30, 2019
if "AGO" in Published_Date:
    Published_Date = date.today() 
else:
    Published_Date =  datetime.strptime(Published_Date, "%b %d %Y").date()

URL of the site. https://simpleflying.com/us-carriers-dot-delay-compensation-push/

CodePudding user response：

You can scrape the @datetime attribute directly from the <time> tag and use the datetime module to parse the date it was published and the timedelta to check how long ago it was published.

import scrapy
import datetime

class DTSpider(scrapy.Spider):
    name = 'dt'
    start_urls = ['https://simpleflying.com/us-carriers-dot-delay-compensation-push/']

    def parse(self, response):
        dt = response.css('span.meta_txt.date').xpath('./time/@datetime').get()
        date = datetime.datetime.fromisoformat(dt[:-1])
        print(date, '|' ,date.day,'|',date.month, '|', date.year) 
        # 2022-10-23 17:10:00 | 23 | 10 | 2022  #<-- output
        today = datetime.datetime.today()
        delta = today - date
        print(delta.days)   # 0  <-- output