I am reading an excel files :
dfEx=pd.read_excel(fileExcel, engine='openpyxl')
There is a date column in Object format, and dfEx["Date"]
gives me these results :
id | date |
---|---|
0 | 15/10/2022 10:44:59 |
1 | 15/10/2022 10:47:53 |
2 | 15/10/2022 10:49:50 |
3 | 15/10/2022 10:50:40 |
4 | 15/10/2022 10:54:05 |
... | |
174 | 2022-10-16 00:00:00 |
175 | 2022-10-16 00:00:00 |
176 | 2022-10-16 00:00:00 |
177 | 2022-10-16 00:00:00 |
178 | 2022-10-16 00:00:00 |
I don't get why I have the format change between the beginning and the end, as in the Excel file the separator character is always / for year/month/day
The syntax I have in the Excel file is :
15/10/2022 11:13:64
for the first and
15-10-2022
for the rest
So when I use the datetime function it doesn't work :
dfEx['Date'] = pd.to_datetime(dfEx['Date'], format='%d/%m/%Y %H:%M:%S', exact=False)
ValueError: time data '2022-10-15 00:00:00' does not match format '%d/%m/%Y %H:%M:%S' (search)
CodePudding user response:
Add errors='coerce'
for missing values if no match format 15/10/2022 10:44:59
and then replace missing values by another to_datetime
for convert format 2022-10-16 00:00:00
:
dfEx['date'] = (pd.to_datetime(dfEx['date'], format='%d/%m/%Y %H:%M:%S', errors='coerce')
.fillna(pd.to_datetime(dfEx['date'], errors='coerce')))
print (dfEx)
id date
0 0 2022-10-15 10:44:59
1 1 2022-10-15 10:47:53
2 2 2022-10-15 10:49:50
3 3 2022-10-15 10:50:40
4 4 2022-10-15 10:54:05
5 174 2022-10-16 00:00:00
6 175 2022-10-16 00:00:00
7 176 2022-10-16 00:00:00
8 177 2022-10-16 00:00:00
9 178 2022-10-16 00:00:00
CodePudding user response:
I see that you have two different dates formats in excel:
- Ones with dashes
2022-10-16 00:00:00
- Others with slashes
15/10/2022 10:54:05
So you cannot parse all the dates at once, you first have to split the dataframes:
df_slashes = df[df.Date.str.contains("/", regex=False))
df_dashes = df[df.Date.str.contains("-", regex=False))
Then for each dataframe run the corresponding to_datetime
function:
df_slashes_dates = pd.to_datetime(df_slashes.Date, format='%d/%m/%Y %H:%M:%S')
df_dashes_dates = pd.to_datetime(df_dashes.Date, format='%d-%m-%Y %H:%M:%S')
Finally do whatever you want with these two series.