Hello there stackoverflow community,
I would like to change the datetime format of a column, but I doesn't work and I don't know what I'am doing wrong.
After executing the following code:
df6['beginn'] = pd.to_datetime(df6['beginn'], unit='s', errors='ignore')
I got this output, and thats fine, but i would like to take out the hour to have only %m/%d/%Y
left.
ID DATE
91060 2017-11-10 00:00:00
91061 2022-05-01 00:00:00
91062 2022-04-01 00:00:00
Name: beginn, Length: 91063, dtype: object
I've tried this one and many others
df6['beginn'] = df6['beginn'].dt.strftime('%m/%d/%Y')
and get the following output:
AttributeError: Can only use .dt accessor with datetimelike values.
But I don't understand why, I've transformed the data with pd.to_datetime
or not?
Appreciate any hint you can give me! Thanks a lot!
CodePudding user response:
The reason you have to use errors="ignore"
is because not all the dates you are parsing are in the correct format. If you use errors="coerce"
like @phi has mentioned then any dates that cannot be converted will be set to NaT
. The columns datatype will still be converted to datatime64 and you can then format as you like and deal with the NaT
as you want.
Example
A dataframe with one item in Date
not written as Year/Month/Day (25th Month is wrong):
>>> df = pd.DataFrame({'ID': [91060, 91061, 91062, 91063], 'Date': ['2017/11/10', '2022/05/01', '2022/04/01', '2055/25/25']})
>>> df
ID Date
0 91060 2017/11/10
1 91061 2022/05/01
2 91062 2022/04/01
3 91063 2055/25/25
>>> df.dtypes
ID int64
Date object
dtype: object
Using errors="ignore"
:
>>> df['Date'] = pd.to_datetime(df['Date'], errors='ignore')
>>> df
ID Date
0 91060 2017/11/10
1 91061 2022/05/01
2 91062 2022/04/01
3 91063 2055/25/25
>>> df.dtypes
ID int64
Date object
dtype: object
Column Date
is still an object because not all the values have been converted. Running df['Date'] = df['Date'].dt.strftime("%m/%d/%Y")
will result in the AttributeError
Using errors="coerce"
:
>>> df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
>>> df
ID Date
0 91060 2017-11-10
1 91061 2022-05-01
2 91062 2022-04-01
3 91063 NaT
>>> df.dtypes
ID int64
Date datetime64[ns]
dtype: object
Invalid dates are set to NaT and the column is now of type datatime64 and you can now format it:
>>> df['Date'] = df['Date'].dt.strftime("%m/%d/%Y")
>>> df
ID Date
0 91060 11/10/2017
1 91061 05/01/2022
2 91062 04/01/2022
3 91063 NaN
Note: When formatting datatime64, it is converted back to type object so NaT's are changed to NaN. The issue you are having is a case of some dirty data not in the correct format.