Home > Net >  Convert date column (string) to datetime and match the format in Python
Convert date column (string) to datetime and match the format in Python

Time:09-02

I have a similar question to this: Convert date column (string) to datetime and match the format and I want to convert a string like '12/7/21' to '2021-07-12' as a date object. I believe the answer given in the link above is wrong and here is why:

# The suggested solution on Stackoverflow 
>>> import pandas as pd
>>> df = pd.DataFrame({'Date':['15/7/21']})
>>> df['Date']
0    15/7/21
Name: Date, dtype: object
>>> pd.to_datetime(df['Date'].astype('datetime64'),format='%d/%m/%y')
0   2021-07-15
Name: Date, dtype: datetime64[ns]

Because Python doesn't care about the specified format in the above code! If you simply change 15 to 12 and input '12/7/21' then it treats 12 as month instead of day:

>>> df = pd.DataFrame({'Date':['12/7/21']})
>>> df['Date']
0    12/7/21
Name: Date, dtype: object
>>> pd.to_datetime(df['Date'].astype('datetime64'),format='%d/%m/%y')
0   2021-12-07
Name: Date, dtype: datetime64[ns]

Does anyone know what's the best solution to this problem? (In R you simply use lubridate::dmy(df$Date) and it works perfectly)

CodePudding user response:

.astype('datetime64') attempts to parse the string MM/DD/YY however if it can't (in the case that MM > 12) it will fall back to parsing as DD/MM/YY this is why you see seemingly inconsistent behaviour:

>>> import pandas as pd
>>> pd.Series('15/7/21').astype('datetime64')
0   2021-07-15
dtype: datetime64[ns]
>>> pd.Series('14/7/21').astype('datetime64')
0   2021-07-14
dtype: datetime64[ns]
>>> pd.Series('13/7/21').astype('datetime64')
0   2021-07-13
dtype: datetime64[ns]
>>> pd.Series('12/7/21').astype('datetime64')
0   2021-12-07
dtype: datetime64[ns]

The way to solve this is just to pass a Series of strings to pd.to_datetime instead of intermediately converting to datetime64s. So you can simply do

pd.to_datetime(df['Date'], format='%d/%m/%y')

without the .astype cast

  • Related