I have a similar question to this: Convert date column (string) to datetime and match the format and I want to convert a string like '12/7/21' to '2021-07-12' as a date object. I believe the answer given in the link above is wrong and here is why:
# The suggested solution on Stackoverflow
>>> import pandas as pd
>>> df = pd.DataFrame({'Date':['15/7/21']})
>>> df['Date']
0 15/7/21
Name: Date, dtype: object
>>> pd.to_datetime(df['Date'].astype('datetime64'),format='%d/%m/%y')
0 2021-07-15
Name: Date, dtype: datetime64[ns]
Because Python doesn't care about the specified format in the above code! If you simply change 15 to 12 and input '12/7/21' then it treats 12 as month instead of day:
>>> df = pd.DataFrame({'Date':['12/7/21']})
>>> df['Date']
0 12/7/21
Name: Date, dtype: object
>>> pd.to_datetime(df['Date'].astype('datetime64'),format='%d/%m/%y')
0 2021-12-07
Name: Date, dtype: datetime64[ns]
Does anyone know what's the best solution to this problem? (In R you simply use lubridate::dmy(df$Date) and it works perfectly)
CodePudding user response:
.astype('datetime64')
attempts to parse the string MM/DD/YY
however if it can't (in the case that MM > 12) it will fall back to parsing as DD/MM/YY
this is why you see seemingly inconsistent behaviour:
>>> import pandas as pd
>>> pd.Series('15/7/21').astype('datetime64')
0 2021-07-15
dtype: datetime64[ns]
>>> pd.Series('14/7/21').astype('datetime64')
0 2021-07-14
dtype: datetime64[ns]
>>> pd.Series('13/7/21').astype('datetime64')
0 2021-07-13
dtype: datetime64[ns]
>>> pd.Series('12/7/21').astype('datetime64')
0 2021-12-07
dtype: datetime64[ns]
The way to solve this is just to pass a Series
of strings to pd.to_datetime
instead of intermediately converting to datetime64
s. So you can simply do
pd.to_datetime(df['Date'], format='%d/%m/%y')
without the .astype
cast