Home > Mobile >  Use a regex to find a pattern somewhere between two words
Use a regex to find a pattern somewhere between two words

Time:01-28

Given the following string

{"type":"PrimaryParty","name":"Karen","id":"456789-9996"},
{"type":"SecondaryParty","name":"Juliane","id":"345678-9996"},
{"type":"SecondaryParty","name":"Ellen","id":"001234-9996"}

I am looking for strings matching the pattern \d{6}-\d{4}, but only if they are following the string "SecondaryParty". The processor is Java-based

Using https://regex101.com/ I have come up with this, which works fine using the ECMAScript(JavaScript) Flavor.

(?<=SecondaryParty.*?)\d{6}-\d{4}(?=\"})

But as soon as I switch to Java, it says

* A quantifier inside a lookbehind makes it non-fixed width
? The preceding token is not quantifiable

When using it in java.util.regex, the error says

Look-behind group does not have an obvious maximum length near index 20 (?<=SecondaryParty.*?)\d{6}-\d{4}(?="}) ^

How do I overcome the "does not have an obvious maximum length" problem in Java?

CodePudding user response:

You could use (?<=SecondaryParty)(.*?)(\d{6}-\d{4})(?=\"}) regex expression and take the value of the second group which will match the pattern \d{6}-\d{4}, but only if they are following the string "SecondaryParty".

Sample Java code

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class IdRegexMatcher {
    public static void main(String[] args) {
        String input ="{\"type\":\"PrimaryParty\",\"name\":\"Karen\",\"id\":\"456789-9996\"},\n"  
                "{\"type\":\"SecondaryParty\",\"name\":\"Juliane\",\"id\":\"345678-9996\"},\n"  
                "{\"type\":\"SecondaryParty\",\"name\":\"Ellen\",\"id\":\"001234-9996\"}";

        Pattern pattern = Pattern.compile("(?<=SecondaryParty)(.*?)(\\d{6}-\\d{4})(?=\\\"})");
        Matcher matcher = pattern.matcher(input);
        while (matcher.find()) {
            String idStr = matcher.group(2);
            System.out.println(idStr);
        }
    }
}

which gives the output
345678-9996
001234-9996

One possible optimization in the above regex could be to use [^0-9]*? instead of .*? under the assumption that the name wouldn't contain numbers.

CodePudding user response:

You might use a curly braces quantifier as a workaround:

(?<=SecondaryParty.{0,255})\d{6}-\d{4}(?=\"})

The minimum and maximum inside curly braces quantifier are depend on your actual data.

  • Related