I have an attribute like:
df. CalculationDateKey.head()
0 20201231
1 20201130
2 20201031
3 20200930
4 20200831
Name: CalculationDateKey, dtype: int64
And I want to convert it into datetime.
I tried:
pd.to_datetime(df['CalculationDateKey']).head()
which yields:
0 1970-01-01 00:00:00.020201231
1 1970-01-01 00:00:00.020201130
2 1970-01-01 00:00:00.020201031
3 1970-01-01 00:00:00.020200930
4 1970-01-01 00:00:00.020200831
Name: CalculationDateKey, dtype: datetime64[ns]
I want this, so I can calculate difference in months between two dates.
CodePudding user response:
Don't let Pandas infer your date format so specify it:
>>> pd.to_datetime(df['CalculationDateKey'], format='%Y%m%d')
0 2020-12-31
1 2020-11-30
2 2020-10-31
3 2020-09-30
4 2020-08-31
Name: CalculationDateKey, dtype: datetime64[ns]
CodePudding user response:
When you try to convert integers to dates using to_datetime
, Pandas default unit is nanoseconds (since January 1, 1970). That's why you're getting values like 1970-01-01 00:00:00.020201231 as a result.
As Corralien mentioned in their answer, you can pass a date format string to let Pandas know the correct way to interpret those YYYYmmdd
values.
pd.to_datetime(df['CalculationDateKey'], format='%Y%m%d')