I have a dataframe with the following column. Each row contains different format strings.
col |
----------------------
GRA/B
TPP
BBMY
...
SOCBBA 0 MAX
CMBD 0 MAX
EPR 5.75 MAX
...
PMUST 5.57643 02/15/34
LEO 0 12/30/2099
RGB 3.125 09/15/14
RGB 3.375 04/15/20
I want to convert all the dates to a format that shows the full year.
Is there a way to regex this so that it looks like this.
col |
----------------------
GRA/B
TPP
BBMY
...
SOCBBA 0 MAX
CMBD 0 MAX
EPR 5.75 MAX
...
PMUST 5.57643 02/15/2034
LEO 0 12/30/2099
RGB 3.125 09/15/2014
RGB 3.375 04/15/2020
Right now the only thing I can think of doing is doing,
df['col'] = df['col'].str.replace('/14', '/2014')
for each year, but theres many years, also it will replace the days and months as well.
How can I achieve this properly, should I be using regex?
CodePudding user response:
what about replacing when it "ends with a slash followed by 2 digits"?
In [9]: df["col"] = df["col"].str.replace(r"/(\d{2})$", r"/20\1", regex=True)
In [10]: df
Out[10]:
col
0 GRA/B
1 TPP
2 BBMY
3 ...
4 SOCBBA 0 MAX
5 CMBD 0 MAX
6 EPR 5.75 MAX
7 ...
8 PMUST 5.57643 02/15/2034
9 LEO 0 12/30/2099
10 RGB 3.125 09/15/2014
11 RGB 3.375 04/15/2020
regex:
/
: a literal forward slash(\d{2})
: capture 2 digits$
: end of string
substituter:
/20
: literally forward slash and 20\1
: first capturing group in the regex, i.e., the last 2 digits there