I have a dataframe with a column of dates in the format MMDDYYY. I want to convert the dates into the format YYYY-MM-DD. This works for most dates. But for dates starting with 1, the wrong output is given. In this example, the last 3 rows are wrong. There are many rows so I cannot hardcode the correct value.
OriginalDates (MMDDYYYY) OutputDates (YYYYMMDD) ExpectedDates (YYYYMMDD) Correct Output?
5011989 1989-05-01 1989-05-01 Yes
6011989 1989-06-01 1989-06-01 Yes
12042009 2009-12-04 2009-12-04 Yes
01012001 2001-01-01 2001-01-01 Yes
1161955 1955-11-06 1955-01-16 No
1051991 1991-10-05 1991-01-05 No
1011933 1933-10-01 1933-01-01 No
My code:
df['OutputDates'] = pd.to_datetime(df['OriginalDates'], format='%m%d%Y')
df['OutputDates'] = pd.to_datetime(df['OutputDates'], format='%Y-%m-%d')
CodePudding user response:
There you go using string slicing, not the cleanest solution but it does what you require :
def format_date(x):
if len(x) == 7:
return x[-4:] '-' x[-6:3] '-' x[-8:1]
if len(x) == 8:
return(x[-4:] '-' x[2:4] '-' x[0:2])
df['OriginalDates (MMDDYYYY)'] = df['OriginalDates (MMDDYYYY)'].apply(lambda x: format_date(str(x)))
df['OriginalDates (MMDDYYYY)'] = pd.to_datetime(df['OriginalDates (MMDDYYYY)'], format='%Y-%d-%m')