I have extracted a text which is a datetime from articles with scrapy, and from this text I want to get the date only.
the text looks like this:
" - Nov 13, 2021, 10:00 AM CST"
How can I extract the date only? which is Nov 13, 2021
the current script I used to get the text is
'datetime': response.xpath('//*[@]/text()[2]').get()
Thank you in advance
CodePudding user response:
Using regex will work. This pattern should do the trick \w ?\s\d\d,\s\d{4}
import re
pattern = re.compile(r'\w ?\s\d\d,\s\d{4}')
datetime = response.xpath('//*[@]/text()[2]').get()
date = pattern.search(datetime).group()
print(date)
Out: 'Nov 13, 2021'
CodePudding user response:
You can use regex:
scrapy shell file:///PATH_TO_FILE/temp.html
In [1]: response.xpath('//*[@]/text()[2]').re(r'[a-zA-Z]{3} \d{1,2}, \d{4}')[0]
Out[1]: 'Nov 13, 2021'