I have to use a regular expression to parse values out of a swift message and there are some situations where the behaviour is not what I want.
Lets say I am after something with a particular pattern - in this case a BIC (6 letters, followed by 2 letters or digits followed by optional XXX
or 3 digits)
([A-Z]{6}[A-Z0-9]{2}[XXX0-9]{0,3})
this is fine but now I want to look for these bank codes in particular fields. In swift a field is denoted with :
and has some numbers and sometimes a letter.
so I want to match a BIC value in field 52A
I can do the following
(52A:[A-Z]{6}[A-Z0-9]{2}[XXX0-9]{0,3})
which would match 52A:AAAAAAAAXXX
my problem is you can have things before and after this value - and the value itself might not exist in the field you want
so I can wildcard the reg ex to allow for things before it for example
(52A:.*?[A-Z]{6}[A-Z0-9]{2}[XXX0-9]{0,3})
matches 52A:somerubbishAAAAAAAAXXX
but if there isnt something within this field - the reg ex continues to search for the pattern and this is where i have a problem.
for example the above reg ex matches this 52A:somerubbish:57D:AAAAAAAAXXX
Question
I need the reg ex to stop on the first field that is after it (it might not always be 57D
but it will always follow the format [0-9]{2}[A-Z]{0,1})
so the above example shouldnt return a match as the pattern I am after is not contained in the 52A
section
Does anyone know how I can do this?
CodePudding user response:
Change .*?
to [^:]*?
:
(52A:[^:]*?[A-Z]{6}[A-Z0-9]{2}[XXX0-9]{0,3})
[^:]
means "any character except :", which ensures the match doesn't run into the next field.
See live demo.
Also, unless your situation requires you to match your target as group 1, you don't need the outer brackets: the entire match (ie group 0) will be your target.
I suspect instead of [XXX0-9]{0,3}
you want (XXX|\d{3})?
(XXX or 3 digits, but optionally) or perhaps (XXX|\d{1,3})?
(XXX or up to 3 digits, but optionally)
CodePudding user response:
Using [XXX0-9]{0,3}
(which is the same as [X0-9]{0,3}
) is a character class notation, repeating 0-3 times an X char or a digit.
If the value itself can also contain a colon, you can match any character as "rubbish" as long as what is directly to the right is not the field format.
52A:(?:(?![0-9]{2}[A-Z]?:).)*[A-Z]{6}[A-Z0-9]{2}(?:[0-9]{3}|XXX)?
The pattern matches:
52A:
Match literally(?:(?![0-9]{2}[A-Z]?:).)*
Match any character asserting not 2 digits, optional char A-Z and : directly to the right[A-Z]{6}[A-Z0-9]{2}
Match 6 chars A-Z and 2 chars A-Z or 0-9(?:[0-9]{3}|XXX)?
Optionally match 3 digits or XXX
See a regex demo.