Instead of a day ago, minutes ago, or hours ago I want to scrape the date if published today so the date should be of today or the date of published day in scrapy using python.
This the code I try.
Published_Date = response.css('time::text').get().replace(",","").replace("Published ","")#Published Jul 30, 2019
if "AGO" in Published_Date:
Published_Date = date.today()
else:
Published_Date = datetime.strptime(Published_Date, "%b %d %Y").date()
URL of the site. https://simpleflying.com/us-carriers-dot-delay-compensation-push/
CodePudding user response:
You can scrape the @datetime
attribute directly from the <time>
tag and use the datetime
module to parse the date it was published and the timedelta
to check how long ago it was published.
import scrapy
import datetime
class DTSpider(scrapy.Spider):
name = 'dt'
start_urls = ['https://simpleflying.com/us-carriers-dot-delay-compensation-push/']
def parse(self, response):
dt = response.css('span.meta_txt.date').xpath('./time/@datetime').get()
date = datetime.datetime.fromisoformat(dt[:-1])
print(date, '|' ,date.day,'|',date.month, '|', date.year)
# 2022-10-23 17:10:00 | 23 | 10 | 2022 #<-- output
today = datetime.datetime.today()
delta = today - date
print(delta.days) # 0 <-- output