I am working through a huge list of package names for customers which need to be parsed to find out price information. Sample package names are as follows:
- Jan24_Package1_USD2_Rest_Of_String
- Jan25_Package2_2USD_Rest_Of_String
- Jan26_Package3_USD_2_Rest_Of_String
- Jan24_Package4_2_USD_Rest_Of_String
So for first and third string USD is leading the value 2 and for the rest ones USD is trailing. Looking for a regular expression which will find output 2 in all use cases.
I was trying with group 3 (\d ) for the following
(USD)(_*)(\d )(_*)
This works fine for string 1 and 3, but it doesn't work with string 2 and 4.
Looking for a solution here. Thanks a lot.
CodePudding user response:
It could be solved using two possible cases (capture group 2 or 3 in regexp):
import re
strings = ['Jan24_Package1_USD2_Rest_Of_String',
'Jan25_Package2_2USD_Rest_Of_String',
'Jan26_Package3_USD_2_Rest_Of_String',
'Jan24_Package4_2_USD_Rest_Of_String']
for string in strings:
match = re.search(r'.*_(USD_?(\d )|(\d )_?USD)', string)
if match:
#print group 2 or group 3 if group 2 is empty
if match.group(2):
print(match.group(2))
else:
print(match.group(3))