Home > Software engineering >  Converting a column of strings with different formats that contains year from 2 digits (YY) to 4 dig
Converting a column of strings with different formats that contains year from 2 digits (YY) to 4 dig

Time:12-28

I have a dataframe with the following column. Each row contains different format strings.

col                  |
----------------------
GRA/B
TPP
BBMY
...
SOCBBA 0 MAX
CMBD 0 MAX
EPR 5.75 MAX
...
PMUST 5.57643 02/15/34
LEO 0 12/30/2099
RGB 3.125 09/15/14
RGB 3.375 04/15/20

I want to convert all the dates to a format that shows the full year.

Is there a way to regex this so that it looks like this.

col                  |
----------------------
GRA/B
TPP
BBMY
...
SOCBBA 0 MAX
CMBD 0 MAX
EPR 5.75 MAX
...
PMUST 5.57643 02/15/2034
LEO 0 12/30/2099
RGB 3.125 09/15/2014
RGB 3.375 04/15/2020

Right now the only thing I can think of doing is doing,

df['col'] = df['col'].str.replace('/14', '/2014')

for each year, but theres many years, also it will replace the days and months as well.

How can I achieve this properly, should I be using regex?

CodePudding user response:

what about replacing when it "ends with a slash followed by 2 digits"?

In [9]: df["col"] = df["col"].str.replace(r"/(\d{2})$", r"/20\1", regex=True)

In [10]: df
Out[10]:
                         col
0                      GRA/B
1                        TPP
2                       BBMY
3                        ...
4               SOCBBA 0 MAX
5                 CMBD 0 MAX
6               EPR 5.75 MAX
7                        ...
8   PMUST 5.57643 02/15/2034
9           LEO 0 12/30/2099
10      RGB 3.125 09/15/2014
11      RGB 3.375 04/15/2020

regex:

  • /: a literal forward slash
  • (\d{2}): capture 2 digits
  • $: end of string

substituter:

  • /20: literally forward slash and 20
  • \1: first capturing group in the regex, i.e., the last 2 digits there
  • Related