Home > front end >  get date only from a text with scrapy
get date only from a text with scrapy

Time:07-14

I have extracted a text which is a datetime from articles with scrapy, and from this text I want to get the date only.

the text looks like this:

" - Nov 13, 2021, 10:00 AM CST"

How can I extract the date only? which is Nov 13, 2021

the current script I used to get the text is

'datetime': response.xpath('//*[@]/text()[2]').get()

Thank you in advance

CodePudding user response:

Using regex will work. This pattern should do the trick \w ?\s\d\d,\s\d{4}

import re

pattern = re.compile(r'\w ?\s\d\d,\s\d{4}')
datetime = response.xpath('//*[@]/text()[2]').get()
date = pattern.search(datetime).group()
print(date)

Out: 'Nov 13, 2021'

CodePudding user response:

You can use regex:

scrapy shell file:///PATH_TO_FILE/temp.html

In [1]: response.xpath('//*[@]/text()[2]').re(r'[a-zA-Z]{3} \d{1,2}, \d{4}')[0]
Out[1]: 'Nov 13, 2021'
  • Related