Home > Back-end >  Format Pandas date array with mixed formats
Format Pandas date array with mixed formats

Time:11-24

I'm trying to unify dates in a column as they come in different formats; current date entries:

[... '18-Aug-21' '16-Aug-21' '17-Aug-21' '22-Aug-21' '21-Aug-21' '20-Aug-21' '19-Aug-21' '23-Aug-21' '24-Aug-21' '25-Aug-21' '28-Aug-21' '26-Aug-21' '27-Aug-21' '31-Aug-21' '30-Aug-21' '29-Aug-21' '06 Sep 2021' '07 Sep 2021' '23 Sep 2021' '17 Sep 2021' '18 Sep 2021' '30 Sep 2021' '11 Sep 2021' '12 Sep 2021' '20 Sep 2021' '15 Sep 2021' '16 Sep 2021' '08 Sep 2021' '09 Sep 2021' '24 Sep 2021' '25 Sep 2021' '03 Sep 2021' '10 Sep 2021' '19 Sep 2021' '01 Sep 2021' '29 Sep 2021' '26 Sep 2021' '27 Sep 2021' '13 Sep 2021' '14 Sep 2021' '02 Sep 2021' '04 Sep 2021' '05 Sep 2021' ...

#1: trying to replace the dash here doesn't work on all dates

#2: when the year is YY as in '6-Aug-21', how can I format it?

for date in DF_all["SALES_DATE"]:
    date = date.replace("-"," ")

DF_all["SALES_DATE"] = pd.to_datetime(DF_all["SALES_DATE"], format='%d 
%b%Y', errors='ignore')

print(DF_all["SALES_DATE"].unique())

Output:

[...'18-Aug-21' '16-Aug-21' '17-Aug-21' '22-Aug-21' '21-Aug-21' '20-Aug-21' '19-Aug-21' '23-Aug-21' '24-Aug-21' '25-Aug-21' '28-Aug-21' '26-Aug-21' '27-Aug-21' '31-Aug-21' '30-Aug-21' '29-Aug-21' '06 Sep 2021' '07 Sep 2021' '23 Sep 2021' '17 Sep 2021' '18 Sep 2021' '30 Sep 2021' '11 Sep 2021' '12 Sep 2021' '20 Sep 2021' '15 Sep 2021' '16 Sep 2021' '08 Sep 2021' '09 Sep 2021' '24 Sep 2021' '25 Sep 2021' '03 Sep 2021' '10 Sep 2021' '19 Sep 2021' '01 Sep 2021' '29 Sep 2021' '26 Sep 2021' '27 Sep 2021' '13 Sep 2021' '14 Sep 2021' '02 Sep 2021' '04 Sep 2021' ...]

Is there a preferred method in python that solves this issue?

CodePudding user response:

I recommend dateutil for this:

import dateutil
DF_all["SALES_DATE"] = DF_all["SALES_DATE"].apply(dateutil.parser.parse)

Output:

>>> DF_all
0    2021-08-18
1    2021-08-16
2    2021-08-17
3    2021-08-22
4    2021-08-21
5    2021-08-20
6    2021-08-19
7    2021-08-23
8    2021-08-24
9    2021-08-25
...
Name: 0, dtype: datetime64[ns]

You might need to install dateutil first. Run the following in a terminal:

pip install python-dateutil

Or, in IPython or a Jupyter Notebook, run:

!pip install python-dateutil
  • Related