Home > OS >  Datetime format of a Pandas dataframe column switching randomly [duplicate]
Datetime format of a Pandas dataframe column switching randomly [duplicate]

Time:10-01

I am using a dataframe which has a 'Date' column. I have used pd.to_datetime() to convert this column format to yyyy-mm-dd. However, this format is getting switched to some other format at intermittent dates in the dataframe (eg: yyyy-dd-mm).

Date 
2021-02-01 <----- this is 2nd Jan, 2021
2021-01-21 <----- this is 21st Jan, 2021

Further, I have alto tried using the df['Date'].dt.strftime('%y-%m-%d'), but this too has not helped.

I request some guidance on the following points:

  1. For any Date column, is it enough to just use pd.to_datetime() and be rest assured that all dates will be in correct format?
  2. Or do I need to manually state the datetime format explicitly alongwith the pd.to_[enter image description here][1]datetime() feature?

CodePudding user response:

The problem comes from how pandas parses dates. When receiving 2021-02-01 it does not know if it is Feb 1st or Jan 2nd, so it applies its default decision rules: when the date starts with the year, the next field is the month, so resulting in Feb 1st. This is not the case when parsing 2021-01-21, there is only one possible date, Jan 21st.

Take a look at to_datetime documentation, and its parameters day_first or format, to force a given format when there are different possible parsings

  • Related