I have a CSV file that contains information about the period August 22, 2022 up to September 21, 2022. I loaded to CSV into Python using the Pandas library. The timestamps in the CSV file are in a Dutch format (and are strings), i.e., %d-%m-%Y. When I use pd.to_datetime()
for the timestamps, not all data points are converted correctly. For example:
Old (in %d-%m-%Y) | New |
---|---|
22-02-22 | 2022-08-22 (%Y-%m-%d) |
31-08-22 | 2022-08-31 (%Y-%m-%d) |
01-09-22 | 2022-01-09 (%Y-%d-%m) |
06-09-22 | 2022-06-09 (%Y-%d-%m) |
12-09-22 | 2022-12-09 (%Y-%d-%m) |
13-09-22 | 2022-09-13 (%Y-%m-%d) |
21-09-22 | 2022-09-21 (%Y-%m-%d) |
So, for some data points the months and days are interchanged. I want to convert the strings into the right datetime format. How to solve this?
Thanks in advance!
CodePudding user response:
You have to specify the format during conversation because the dates here can be confusing to pandas. The default format is %Y-%m-%d.
So try this instead:
df.<your_date_col> = pd.to_datetime(df.<your_date_col>, format='%d-%m-%y')
CodePudding user response:
If you read the docs https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html you will see that pd.to_datetime
can take yearfirst
and dayfirst
as parameters.
In your case, just pass dayfirst=True
to it and that's it.