I am using a dataframe which has a 'Date' column. I have used pd.to_datetime()
to convert this column format to yyyy-mm-dd. However, this format is getting switched to some other format at intermittent dates in the dataframe (eg: yyyy-dd-mm).
Date
2021-02-01 <----- this is 2nd Jan, 2021
2021-01-21 <----- this is 21st Jan, 2021
Further, I have alto tried using the df['Date'].dt.strftime('%y-%m-%d')
, but this too has not helped.
I request some guidance on the following points:
- For any Date column, is it enough to just use pd.to_datetime() and be rest assured that all dates will be in correct format?
- Or do I need to manually state the datetime format explicitly alongwith the pd.to_[enter image description here][1]datetime() feature?
CodePudding user response:
The problem comes from how pandas parses dates.
When receiving 2021-02-01
it does not know if it is Feb 1st or Jan 2nd, so it applies its default decision rules: when the date starts with the year, the next field is the month, so resulting in Feb 1st.
This is not the case when parsing 2021-01-21
, there is only one possible date, Jan 21st.
Take a look at to_datetime documentation, and its parameters day_first
or format
, to force a given format when there are different possible parsings