Home > other >  java regex matcher exception on unknown character
java regex matcher exception on unknown character

Time:09-24

So I have a String I want to split into tokens of different types as part of a larger Parser.

String input = "45   31.05 * 110 @ 54";

I use javas regex libraries Pattern and Matcher to interpret my regexes and find matches.

String floatRegex = "[0-9] (\\.([0-9]) )?";
String additionRegex = "[ ]";
String multiplicationRegex = "[*]";
String integerRegex = "[0-9] "

All my regexes gets merged into a single master regex with pipe symbols between the different regexes.

String masterOfRegexes = "[0-9] (\\.([0-9]) )?|[ ]|[*]|[0-9] "

I send this pattern into Pattern.compile() and get the matcher. As I step though from left to right running matcher.find(), I expect to get this structure out, up to the point of the "@" symbol where an InvalidInputException should be thrown.

[
  ["Integer": "45"],
  ["addition": " "],
  ["Float": "31.05"],
  ["multiplication": "*"],
  ["Integer": "110"]
  Exception should be thrown...
]

Problem is that matcher.find() skips the "@" symbol completely and instead find the match of the next Integer past "@", which is "54".

Why does it skip the "@" symbol and how can I make it so the exception gets thrown on a character it doesn't recognize from my pattern?

CodePudding user response:

A regex matches or it does not match. In your example data, it does not skip over the @, it just does not match it.

What you could do is identify the valid matches in a single capture group, and when looping though the matches check if group 1 is not null.

If it is not, then the pattern has a valid group 1 match, else you can throw your Exception.

See a regex demo and a Java demo.

String regex = "([0-9] (?:\\.[0-9] )?|[ ]|[*]|[0-9] )|\\S ";
String string = "45   31.05 * 110 @ 54";

Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    if (matcher.group(1) == null) {
        // your Exception here
        // throw new Exception("No match!");
        System.out.println(matcher.group()   " -> no match");
    } else {
        System.out.println(matcher.group(1)   " -> match");
    }
}

Output

45 -> match
  -> match
31.05 -> match
* -> match
110 -> match
@ -> no match
54 -> match

CodePudding user response:

Matcher knows:

  • matches: matching all, the entire input
  • find: somewhere in the input
  • lookingAt: from start, but not necessarily to the end

Your use of find skipped the "@". Use the rare lookingAt, or check the find start/end positions.

  • Related