Home > Software engineering >  Regular Expression - Terminating character appears in string
Regular Expression - Terminating character appears in string

Time:05-27

I'm back with another regular expression question. I've tried a few things and I cant seem to crack this unfortunate issue with some messaging data I have. I need to parse a particular value of of a swift message, this works in 99% of my cases but sometimes someone has entered the terminating character in a field I care about.

Imagine I have a text string of this

some noise :50F: some noise 3/GB some noise :50A: 

my expression is looking for the 2 characters that come after 3/ in the field :50F: and is coded as follows;

50F:[^:]*?3\/([A-Z]{2})

I use the [^:] because I only care about those values in the 50F field for example if I had string like this;

some noise :50F: some noise some noise :50A: 3/GB 

I wouldn't want to match GB

this works really well - apart from on the very rare occasions where my string itself contains a : before the field ends (it seems there is no restriction on this) so for example;

some noise :50F: some : noise 3/GB some noise :50A: 

obviously returns nothing - its only really searching "some" there.

the issue is its not neccesary that :50A: follows this field it could be any one of a number of fields (and I am not even sure of the list) but each field is :[0-9]{2,3}[A-Z]{0,1}: - is there anyway to make the searching for the value stop when it reaches something of that pattern? instead of the one colon I am using currently?

I suspect the solution is some kind of negative lookahead - i've just not managed to get anything to work particularly so far

CodePudding user response:

You can use

50F:(?:(?!:[0-9]{2,3}[A-Z]?:).)*?3\/([A-Z]{2})

See the regex demo.

Details:

  • 50F: - a literal string
  • (?:(?!:[0-9]{2,3}[A-Z]?:).)*? - any single char (other than a line break chars), zero or more occurrences but as few as possible, that does not start the following pattern: : two or three digits an optional ASCII uppercase letter and a : char (for more details, see the tempered greedy token related post)
  • 3\/ - a literal 3/ string
  • ([A-Z]{2}) - Group 1: two uppercase letters.
  • Related