Home > Software engineering >  Convert string date column with format of ordinal numeral day, abbreviated month name, and normal ye
Convert string date column with format of ordinal numeral day, abbreviated month name, and normal ye

Time:03-11

Given the following df with string date column with ordinal numbers for day, abbreviated month name for month, and normal year:

             date       oil       gas
0    1st Oct 2021       428        99
1   10th Sep 2021       401       101
2    2nd Oct 2020       189        74
3   10th Jan 2020       659       119
4    1st Nov 2019       691       130
5   30th Aug 2019       742       162
6   10th May 2019       805       183
7   24th Aug 2018       860       182
8    1st Sep 2017       759       183
9   10th Mar 2017       617       151
10  10th Feb 2017       591       149
11  22nd Apr 2016       343        88
12  10th Apr 2015       760       225
13  23rd Jan 2015      1317       316

I'm wondering how could we parse date column to standard %Y-%m-%d format?

My ideas so far: 1. strip ordinal indicators ('st', 'nd', 'rd', 'th') from character day string while keeping the day number with re; 2. and convert abbreviated month name to numbers (seems not %b), 3. finally convert them to %Y-%m-%d.

Code may be useful for the first step:

re.compile(r"(?<=\d)(st|nd|rd|th)").sub("", df['date'])

References:

https://metacpan.org/release/DROLSKY/DateTime-Locale-0.46/view/lib/DateTime/Locale/en_US.pm#Months

CodePudding user response:

pd.to_datetime already handles this case if you don't specify the format parameter:

>>> pd.to_datetime(df['date'])
0    2021-10-01
1    2021-09-10
2    2020-10-02
3    2020-01-10
4    2019-11-01
5    2019-08-30
6    2019-05-10
7    2018-08-24
8    2017-09-01
9    2017-03-10
10   2017-02-10
11   2016-04-22
12   2015-04-10
13   2015-01-23
Name: date, dtype: datetime64[ns]
  • Related