I have a list of strings which are in different formats, some of the examples are:-
a=['08/58/13ND','08/58/16ND','08/58/18ND','114/15ND','04/2010/PB','AB/23/2016/CHE']
In this the last digits after "/" are YEAR for each string. Some of the strings have year in perfect format like 2010 and 2016 which I want to leave them as it is. But for other strings the last digits after "/" have 13,16,18 and 15,etc.
I want them to be in YYYY format.
Expected Output:
['08/58/2013/ND','08/58/2016/ND','08/58/2018/ND','114/2015/ND','04/2010/PB','AB/23/2016/CHE']
CodePudding user response:
You could try as follows:
import re
a = ['08/58/13ND','08/58/16ND','08/58/18ND','114/15ND','04/2010/PB','AB/23/2016/CHE']
pattern = r'(\d{2})(?=[A-Z] $)'
b = [re.sub(pattern,r'20\1/', x) for x in a]
print(b)
['08/58/2013/ND', '08/58/2016/ND', '08/58/2018/ND', '114/2015/ND', '04/2010/PB', 'AB/23/2016/CHE']
Explanation r'(\d{2})(?=[A-Z] $)'
:
(\d{2})
is a capturing group for 2 digits, to be matched only if followed by one or more characters in[A-Z]
at the end of the string$
. For this we use the positive lookahead:(?=[A-Z] $)
.
Explanation r'20\1/'
:
\1
references aforementioned capturing group;- prepending
20
(assuming ALL your years are>= 2000
) and appending/
.