I am facing issues when converting a date from dd/mm/yy to dd/mm/yyyy with pandas.
I have tried this:
df['Date'] = pd.to_datetime(df['Date'])
But somehow for some values the day and month are swapping... Is there another way to convert correctly?
Thanks in advance for answering me :)
CodePudding user response:
Short Version
You can specify the format
but that doesn't mean there won't be any problems due to the Y2K bug inherent in two-digit years
df['Date'] = pd.to_datetime(df['Date'],format='%d/%m/%y')
The string '10/12/68'
will be parsed as December 10, 2068.
If possible, get whoever creates this value to use the unambiguous ISO8601 format, ie YYYY-MM-DD
Long Explanation
The correct way is to simply not use such strings. There's no way even a human could guess what xx/xx/xx
means. What does 12/5/12
mean? December 5 or May 12? 1912 or 2012? How about 10/12/68
?
This isn't nitpicking. Such bugs are still causing serious problems even now:
- The Spanish family wrongly accused of child pornography due to a mistake reading a date September 2020
- Y2K? How about Y2.02K as Lloyds suffers its second TITSUP* of the year January 2020
- Y2K quick-fix crick? 1920s come roaring back after mystery blip at UK's vehicle licensing agency January 2020
The real fix is to modify the application that generates such dates to use the unambiguous YYYY-MM-DD format, or use a source that has strongly-typed dates. For example Excel and almost all databases have date types.
Workarounds
When that's not possible, you need to specify the date pattern using the format
parameter.
df['Date'] = pd.to_datetime(df['Date'],format='%d/%m/%y')
There's still a chance for errors though, because the two-digit year will get translated based on a cutoff rule.
For example in this SO question '10/12/68'
was parsed as 2068-12-10
instead of 1968
:
>>> datetime.strptime('10/12/68', '%d/%m/%y')
datetime.datetime(2068, 12, 10, 0, 0)
Oops