I have a dataframe in which there are some columns. One of them is Date but some dates are in the format of dd-MMM-YY (eg: 03-May-2022) and some are in the format dd-mm-yy(eg: 03-05-2022). How do I change all the dates in the column to one format(eg: dd-mm-yy)?
Initial Dataframe:
Serial | Date |
---|---|
0 | 23-05-2022 |
1 | 14-Mar-2022 |
2 | 29-02-2020 |
Required Output:
Serial | Date |
---|---|
0 | 23-05-2022 |
1 | 14-03-2022 |
2 | 29-02-2020 |
CodePudding user response:
I can think of one solution where you handle the two different cases based on length of the date string:
def format_date(date):
reformatter = {
"Jan": "01",
"Feb": "02",
"Mar": "03",
...
"Dec": "12"
}
if len(date) == 10:
return date
else:
return date[:4] reformatter[date[3:6]] date[6:]
df["Date"].apply(lambda x: format_date(x))
CodePudding user response:
You can try to use automatic date parsing:
df['Date'] = (pd.to_datetime(df['Date'], infer_datetime_format=True)
.dt.strftime('%d-%m-%Y')
)
Or, using str.replace
:
from calendar import month_abbr
d = dict(zip(month_abbr, [f'{i:02d}' for i in range(13)]))
df['Date'] = df['Date'].str.replace(r'[A-Z][a-z] ', lambda x: d.get(x.group(), x), regex=True)
Output:
Serial Date
0 0 23-05-2022
1 1 14-03-2022
2 2 29-02-2020