Home > Blockchain >  Python regEx for datetime mining
Python regEx for datetime mining

Time:10-02

I'm trying to mine a text into a list using re.

Here is what I've written:

dateStr =  "20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009"
regex = r'(?:\d{1,2}[/-]*)?(?:Mar)?[a-z\s,.]*(?:\d{1,2}[/-]*) (?:\d{2,4}) '
result = re.findall(regex, dateStr)

Even if I stated (?:\d{1,2}[/-]*) at the beginning of the expression, I'm missing the days digits. Here is what I get :

['Mar 2009', 'March 2009', 'Mar. 2009', 'March, 2009']

Could you help? Thanks

Edit:
This question was solved through the comments.

Original assignment string: dateStr = "04-20-2009; 04/20/09; 4/20/09; 4/3/09; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009; 20 Mar 2009; 20 March 2009; 2 Mar. 2009; 20 March, 2009; Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009; Feb 2009; Sep 2009; Oct 2010; 6/2008; 12/2009; 2009; 2010"

CodePudding user response:

I would use:

dateStr =  "20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009"
dt = re.findall(r'\d{1,2} \w [,.]? \d{4}', dateStr)
print(dt)  # ['20 Mar 2009', '20 March 2009', '20 Mar. 2009', '20 March, 2009']

The one size fits all regex pattern used above says to match:

\d{1,2}  a one or two digit day
[ ]      space
\w       month name or abbreviation
[,.]?    possibly followed by comma or period
[ ]      space
\d{4}    four digit year

CodePudding user response:

One of the many approaches:

import re
dateStr =  "20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009"
regex = r'[0-9]{1,2}\s[a-zA-Z] [.,]*\s[0-9]{4}'
result = re.findall(regex, dateStr)
print (result)

Output:

['20 Mar 2009', '20 March 2009', '20 Mar. 2009', '20 March, 2009']
  • Related