I have a dataframe like this
from datetime import datetime
df1=pd.DataFrame({'date':['2016-02-02','2013','2015-02-03','2014-03-02']})
try:
pd.to_datetime(df1)
except ValueError:
print("second is wrong")
we see that second value is not valid data format for datetime %Y-%m-%d so what should I wrote to create desired output like this with pandas method
true
false
true
true
rather than using
try:
res = bool(datetime.strptime(test_str, format))
except ValueError:
res = False
to testing the whole dataframe with a for loop?
EDIT: this is just a testing dataframe, the real dataframe is way larger than this one, the reason I want to avoid for loop trying to make it run faster (as we know apply lambda run faster than for loop). It would be better if using pandas local method
CodePudding user response:
try and except is actually good practice. You can wrap it into a function which you can apply:
import pandas as pd
from datetime import datetime
def datetime_if_possible(dt):
try:
return datetime.fromisoformat(dt)
except ValueError:
return None # or any other default value
df1 = pd.DataFrame({'date':['2016-02-02','2013','2015-02-03','2014-03-02']})
df1['parsed'] = df1['date'].apply(datetime_if_possible)
date parsed
0 2016-02-02 2016-02-02
1 2013 NaT
2 2015-02-03 2015-02-03
3 2014-03-02 2014-03-02
CodePudding user response:
pandas "apply()" function is one of the simplistic way to avoid for loop, with specifying predefined format. This answer is an extension to the one provided by @Raphael, with the flexibility to format and print the required boolean values.
from datetime import datetime
import pandas as pd
def check_date(date):
try:
datetime.datetime.strptime(date, "%Y-%m-%d")
print(True)
except ValueError:
print(False)
df1=pd.DataFrame({'date':['2016-02-02','2013','2015-02-03','2014-03-02']})
df1["date"].apply(check_date)
CodePudding user response:
Use str.match
m = df1['date'].str.match('\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])')
print(m)
0 True
1 False
2 True
3 True
Name: date, dtype: bool