Home > other >  Multiple date formats, towards one proper format
Multiple date formats, towards one proper format

Time:05-11

I have a date columns with multiple dates:

Date
2022-01-01 00:00:00
jan 20
january 19

How can I convert them, in a scalable way (without dictionary), to a proper date time format?

I tried:

df['Date_1'] = pd.to_datetime(df['Date'], errors='coerce').astype(str)

df['Date_2'] = pd.to_datetime(df['Date'], errors='coerce', ,yearfirst = False, format = '%B %y')).astype(str)

df['Date1'] = df['Date1'].str.replace('NaT','')
df['Date2'] = df['Date2'].str.replace('NaT','')

Then, I merged the two columns, with:

df['Date3'] = df['Date1']   df['Date2']

But, this is not working, since I need to create another format (for the not-abbreviation months). But when adding the logic above, but then changing the %B for %b, it is duplicating some months (like may, which is both: an abbreviation and full month).

I would like to have the end result:

2022-01-01
2020-01-01
2019-01-01

CodePudding user response:

There is no direct way to handle all formats at once.

What you can do is use successive methods. Here I combined the "january 19" and "jan 20" using a regex. You can use additional .fillna(<new_converter>) should you find more formats in the future.

(pd
 .to_datetime(df['Date'], errors='coerce')
 .fillna(pd.to_datetime(df['Date'].str.replace('([a-z]{3})[a-z] ', r'\1', regex=True),
                        errors='coerce', yearfirst=False, format='%b %y')
        )
)

output:

0   2022-01-01
1   2020-01-01
2   2019-01-01
Name: Date, dtype: datetime64[ns]

CodePudding user response:

Use combine_first to try it with a variety of different date formats:

date = pd.to_datetime(df["Date"], errors="coerce")

for format in ["%b %y", "%B %y"]:
    date = date.combine_first(pd.to_datetime(df["Date"], format=format, errors="coerce"))

df["Date"] = date
  • Related