Given the following df
with string date
column with ordinal numbers for day, abbreviated month name for month, and normal year:
date oil gas
0 1st Oct 2021 428 99
1 10th Sep 2021 401 101
2 2nd Oct 2020 189 74
3 10th Jan 2020 659 119
4 1st Nov 2019 691 130
5 30th Aug 2019 742 162
6 10th May 2019 805 183
7 24th Aug 2018 860 182
8 1st Sep 2017 759 183
9 10th Mar 2017 617 151
10 10th Feb 2017 591 149
11 22nd Apr 2016 343 88
12 10th Apr 2015 760 225
13 23rd Jan 2015 1317 316
I'm wondering how could we parse date
column to standard %Y-%m-%d
format?
My ideas so far: 1. strip ordinal indicators ('st', 'nd', 'rd', 'th'
) from character day string while keeping the day number with re
; 2. and convert abbreviated month name to numbers (seems not %b
), 3. finally convert them to %Y-%m-%d
.
Code may be useful for the first step:
re.compile(r"(?<=\d)(st|nd|rd|th)").sub("", df['date'])
References:
https://metacpan.org/release/DROLSKY/DateTime-Locale-0.46/view/lib/DateTime/Locale/en_US.pm#Months
CodePudding user response:
pd.to_datetime
already handles this case if you don't specify the format
parameter:
>>> pd.to_datetime(df['date'])
0 2021-10-01
1 2021-09-10
2 2020-10-02
3 2020-01-10
4 2019-11-01
5 2019-08-30
6 2019-05-10
7 2018-08-24
8 2017-09-01
9 2017-03-10
10 2017-02-10
11 2016-04-22
12 2015-04-10
13 2015-01-23
Name: date, dtype: datetime64[ns]