I have a column, which represents date, however is formatted as string. I can't use simple pandas.to_datetime like:
01/02/2023
Apr 02, 2016
Jun 2021
2023/12/01
I've tried to create a below formula, where I would list of potential date formats and using for loop to convert the column into desired format. However, it is obviously wrong as the column contains NONEs after applying it. Could you advise me better direction or what should I change, please?
def DateFormat(data):
for fmt in ('%b %d, %Y', '%d/%m/%Y', '%b %Y', '%Y/%b/%d'):
try:
pd.to_datetime(data['date'], format=fmt)
except ValueError:
pass
data['date'] = data.apply(DateFormat, axis = 1)
BEFORE APPLY DateFormat: | ID | Date | | --- | -------------- | | 1 | 01/02/2023 | | 2 | Apr 02, 2016 | | 3 | Jun 2021 | | 4 | 2023/12/01 |
AFTER APPLY DateFormat: | ID | Date | | --- | ----- | | 1 | None | | 2 | None | | 3 | None | | 4 | None |
CodePudding user response:
If you don't use return
to return value then it uses return None
at the end of function.
You should use return pd.to_datetime(...)
And if you want to return original value when it can't convert it then you need return
at the end. OR you could use return
to return some default value.
def DateFormat(data):
for fmt in ('%b %d, %Y', '%d/%m/%Y', '%b %Y', '%Y/%b/%d'):
try:
return pd.to_datetime(data['date'], format=fmt)
except ValueError:
pass
# return original `date` if it couldn't convert
return data['date']
# or return some default value
#return datetime.datetime(1900, 1, 1)
CodePudding user response:
Why don't let Pandas infer your datetime format? and force dayfirst=True
for your first date format (%d/%m/%Y
)
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
print(df)
# Output
ID Date
0 1 2023-02-01
1 2 2016-04-02
2 3 2021-06-01
3 4 2023-12-01