I have a list of strings with different values, and im trying to find the strings in the list that are dates and return the index of the date. I tried using the dateutil parser like this:
x = ["test", "Hello", "abc", "27.02.2020"]
for item in x:
if parse(item) == True:
print(x.index(item))
This does not work since most of the strings in my list are not dates, and the parser does not recognize the format of the strings not being dates. Anyone got a solution to how i could solve this differently?
CodePudding user response:
There are many ways to do it, the easiest one would be use pandas.to_datetime()
which raise exception (ParserError) if not date
import pandas as pd #pip install pandas
def is_date(date_string):
try:
pd.to_datetime(date_string, format='%d.%m.%Y')
return True
except Exception:
return False
x = ["test", "Hello", "abc", "27.02.2020"]
for index, item in enumerate(x):
if is_date(item):
print(index)
CodePudding user response:
The simplest solution is to use regular expression
import re
x = ["test", "Hello", "abc", "27.02.2020"]
for item in x:
if re.match(r'[0-9]{2}\.[0-9]{2}\.[0-9]{4}', item):
print(item, 'is a date')
Note that it cannot validate the dates, such as knowing 32nd of December
is not a valid date
CodePudding user response:
There are different approaches for this, including the one proposed above - pattern matching.
If all the dates in your array have the same format, you could write a helper function which tries to parse the date given a format and, if a ValueError
is thrown because it is not a date, returns None
or whatever you prefer:
from datetime import datetime as dt
def try_parse(x, date_format="%d.%m.%Y"):
try:
return dt.strptime(x, date_format)
except ValueError:
return None
lst = ["test", "Hello", "abc", "27.02.2020"]
[try_parse(x) for x in lst]
OUTPUT
[None, None, None, datetime.datetime(2020, 2, 27, 0, 0)]
The advantage of this is that the only needed library is datetime
. In addition, you could make it more robust passing a list of possible expected date formats and try to parse for all of them - so you are not limited to just your %d.%m.%Y
- defaulting to something is no parsing is successful as in the solution above.
Anyway, I think this is what essentially pandas.to_datetime
does, so you could simply do:
import pandas as pd
[pd.to_datetime(y, errors="coerce") for y in x]
OUTPUT
[NaT, NaT, NaT, Timestamp('2020-02-27 00:00:00')]
The errors=coerce
option makes the method return NaT
in case parsing fails.