I have a date columns with multiple dates:
Date
2022-01-01 00:00:00
jan 20
january 19
How can I convert them, in a scalable way (without dictionary), to a proper date time format?
I tried:
df['Date_1'] = pd.to_datetime(df['Date'], errors='coerce').astype(str)
df['Date_2'] = pd.to_datetime(df['Date'], errors='coerce', ,yearfirst = False, format = '%B %y')).astype(str)
df['Date1'] = df['Date1'].str.replace('NaT','')
df['Date2'] = df['Date2'].str.replace('NaT','')
Then, I merged the two columns, with:
df['Date3'] = df['Date1'] df['Date2']
But, this is not working, since I need to create another format (for the not-abbreviation months). But when adding the logic above, but then changing the %B for %b, it is duplicating some months (like may, which is both: an abbreviation and full month).
I would like to have the end result:
2022-01-01
2020-01-01
2019-01-01
CodePudding user response:
There is no direct way to handle all formats at once.
What you can do is use successive methods. Here I combined the "january 19" and "jan 20" using a regex. You can use additional .fillna(<new_converter>)
should you find more formats in the future.
(pd
.to_datetime(df['Date'], errors='coerce')
.fillna(pd.to_datetime(df['Date'].str.replace('([a-z]{3})[a-z] ', r'\1', regex=True),
errors='coerce', yearfirst=False, format='%b %y')
)
)
output:
0 2022-01-01
1 2020-01-01
2 2019-01-01
Name: Date, dtype: datetime64[ns]
CodePudding user response:
Use combine_first
to try it with a variety of different date formats:
date = pd.to_datetime(df["Date"], errors="coerce")
for format in ["%b %y", "%B %y"]:
date = date.combine_first(pd.to_datetime(df["Date"], format=format, errors="coerce"))
df["Date"] = date