So I have a issue around dates that are coming from a excel sheet which I'm transforming into a CSV and then loading into a data frame. Basically the data I'm dealing with each day can come in two different formats. These two date columns are called Appointment Date
and Attended Date
I'm dealing with (DD/MM/YYYY HH:MM) and (YYYY/MM/DD HH:MM) and its coming from a third party so I cant set the date format structure. What i need to do is parse the data and remove the HH:MM and output the data only has DD/MM/YYYY.
My current code is currently the following:
df['Appointment Date'] = df['Appointment Date'].str.replace(' ', '/', regex=True)
df['Attended Date'] = df['Attended Date'].str.replace(' ', '/', regex=True)
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M").dt.strftime("%d/%m/%Y")
df['Attended Date'] = pd.to_datetime(df['Attended Date'], format="%d/%m/%Y/%H:%M").dt.strftime("%d/%m/%Y")
But I'm not able to parse the data when it comes through as YYYY/MM/DD HH:MM Exception error:
time data '2021-10-08/00:00:00' does not match format '%d/%m/%Y/%H:%M' (match)
Any ideas on how i can get around this?
CodePudding user response:
Try it one way, and if it doesn't work, try it the other way.
try:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M").dt.strftime("%d/%m/%Y")
except WhateverDateParseException:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%Y/%m/%d/%H:%M").dt.strftime("%d/%m/%Y")
Of course, instead of WhateverDateParseException
use the actual exception that is raised in your code.
CodePudding user response:
I would use regular expressions for that as follows:
import pandas as pd
df = pd.DataFrame({"daytime": ["31/12/2020 23:59", "2020/12/31 23:59"]})
df["daypart"] = df["daytime"].str.replace(r" \d\d:\d\d","") # drop HH:MM part
df["day"] = df["daypart"].str.replace(r"(\d\d\d\d)/(\d\d)/(\d\d)", r"\3/\2/\1")
print(df)
output
daytime daypart day
0 31/12/2020 23:59 31/12/2020 31/12/2020
1 2020/12/31 23:59 2020/12/31 31/12/2020
Explanation: I used so-called capturing groups in second .replace
, if there is (4 digits)/(2 digits)/(2 digits) their order is re-arranged that 3rd become 1st, 2nd become 2nd and 1st become 3rd (note that group are 1-based, not 0-base like is case with general python
indexing). AS day
format is now consistent you could be able to parse it easily.
CodePudding user response:
As mentioned by @C14L that method can be followed but my guess seeing your exception is you need to add a seconds format (%S) to your time formatting, so the updated code wld be like
try:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M:%S").dt.strftime("%d/%m/%Y")
except WhateverDateParseException:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%Y/%m/%d/%H:%M:%S").dt.strftime("%d/%m/%Y")