So I have a String I want to split into tokens of different types as part of a larger Parser.
String input = "45 31.05 * 110 @ 54";
I use javas regex libraries Pattern and Matcher to interpret my regexes and find matches.
String floatRegex = "[0-9] (\\.([0-9]) )?";
String additionRegex = "[ ]";
String multiplicationRegex = "[*]";
String integerRegex = "[0-9] "
All my regexes gets merged into a single master regex with pipe symbols between the different regexes.
String masterOfRegexes = "[0-9] (\\.([0-9]) )?|[ ]|[*]|[0-9] "
I send this pattern into Pattern.compile() and get the matcher. As I step though from left to right running matcher.find(), I expect to get this structure out, up to the point of the "@" symbol where an InvalidInputException should be thrown.
[
["Integer": "45"],
["addition": " "],
["Float": "31.05"],
["multiplication": "*"],
["Integer": "110"]
Exception should be thrown...
]
Problem is that matcher.find() skips the "@" symbol completely and instead find the match of the next Integer past "@", which is "54".
Why does it skip the "@" symbol and how can I make it so the exception gets thrown on a character it doesn't recognize from my pattern?
CodePudding user response:
A regex matches or it does not match. In your example data, it does not skip over the @, it just does not match it.
What you could do is identify the valid matches in a single capture group, and when looping though the matches check if group 1 is not null.
If it is not, then the pattern has a valid group 1 match, else you can throw your Exception.
See a regex demo and a Java demo.
String regex = "([0-9] (?:\\.[0-9] )?|[ ]|[*]|[0-9] )|\\S ";
String string = "45 31.05 * 110 @ 54";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
if (matcher.group(1) == null) {
// your Exception here
// throw new Exception("No match!");
System.out.println(matcher.group() " -> no match");
} else {
System.out.println(matcher.group(1) " -> match");
}
}
Output
45 -> match
-> match
31.05 -> match
* -> match
110 -> match
@ -> no match
54 -> match
CodePudding user response:
Matcher
knows:
- matches: matching all, the entire input
- find: somewhere in the input
- lookingAt: from start, but not necessarily to the end
Your use of find
skipped the "@".
Use the rare lookingAt
, or check the find start/end positions.