I'm loading an Excel file with pandas using the parse_dates=True parameter. However, the date format can not change. When I open the file in Excel on my local computer, the date format is accurate, but when the file is loaded in Python, the format is incorrect. The issue is that "dmY acts like mdY for half of the data." I'm expecting the output with YYYY-mm-dd HH:MM:SS. Below is the code for the data
df = pd.DataFrame({'ID':[2914240.0,2914137.0,2929456.0,2920801.0],
'RT_Date':['2021-01-02 06:38:00','2021-01-02 02:58:00','02-19-2021 17:59:00','2021-09-02 12:49:00'],
'DateCreated':['01-31-2021 14:38:12','01-31-2021 10:57:50','02-19-2021 01:58:38','2021-09-02 11:35:51']})
The code which i tried is below
df = pd.read_excel('demo.xlsx', parse_dates=True)
But the above code is not working and returning the same output as shown in the image. Is there any issue with the data/pandas? Can anyone please help me with this?
CodePudding user response:
Using pd.to_datetime() allows for the use of the infer_datetime_format
argument which eases working with datetime values. In this particular case, if you'd like to parse all columns except for ID you can try:
df['RT_Date'],df['DateCreated'] = [pd.to_datetime(df[x],infer_datetime_format=True) for x in df if x != 'ID']
Based on your example, it outputs:
ID RT_Date DateCreated
0 2914240.0 2021-01-02 06:38:00 2021-01-31 14:38:12
1 2914137.0 2021-01-02 02:58:00 2021-01-31 10:57:50
2 2929456.0 2021-02-19 17:59:00 2021-02-19 01:58:38
3 2920801.0 2021-09-02 12:49:00 2021-09-02 11:35:51
Should you want only the date:
df['RT_Date'],df['DateCreated'] = [pd.to_datetime(df[x],infer_datetime_format=True).dt.date for x in df if x != 'ID']
Returning:
ID RT_Date DateCreated
0 2914240.0 2021-01-02 2021-01-31
1 2914137.0 2021-01-02 2021-01-31
2 2929456.0 2021-02-19 2021-02-19
3 2920801.0 2021-09-02 2021-09-02
EDIT: If there are multiple columns you wish to evaluate this for and no easy way, then you can create a list of candidate columns:
to_date_cols = ['RT_Date','DateCreated','Additional_Col1']
And use:
for x in to_date_cols:
df[x] = pd.to_datetime(df[x],infer_datetime_format=True)