For a certain Java homework of mine I've been tasked with evaluating an arithmetic expression and deciding whether or not it is a valid expression. The expressions contain 3 types of brackets ( {},[],() ), digits,whitespace, and the -* and / operators. However, the strings could also contain random junk in which case I shouldnt parse the expression at all.
I have implemented the following regex, and hopefully it will work:
Pattern p = Pattern.compile("[^0-9\\[-\\]\\{-\\}\\(-\\) *\\/\\s-]");
However, I am not very familiar with regex and was wondering if anyone could take a second look?
CodePudding user response:
The suggested syntax isn't regular. The 'regular' in 'Regular Expression' isn't just an arbitrary word, nor is it the last name of its inventor. It's a description of a subset of all imaginable syntaxes.
If a syntax isn't regular, a regex can't parse it properly. Many, many things aren't regular. Basic math (pretty much everything with hierarchy/recursive elements) isn't.
You can't use regexes to parse this stuff.
It sounds like your plan is: "Let's first reject invalid inputs, and then I'll start thinking about where to go from there". This is the wrong approach, you've got it twisted around. Start parsing the input; if it's invalid you'll figure it out trivially as you process it.
The answer to 'parse e.g. 5 (2 * {3 * 10}) * [6 / 2]
' does not involve regular expressions in any way.
note that \\{-\\}
means: "All characters whose unicode values lie between the unicode value of the {
char and the unicode value of the }
char are valid", which you didn't mean there - you wanted just \\{\\}
(as in: The {
character and the }
character). Once you fix this, your regex will correctly 'detect' any characters that are flat out invalid, such as the letter 'a'. However, it will allow a great many things that aren't valid, such as:
(
{} * ()
5 2
---5555---
///
(5 2}
99999999999999999999999999
(that's more than fits inint
)5 2
{(5 2) * (3 [5 2)}
you can't write a regexp that properly denies all these.
CodePudding user response:
The regular expression to validate just the mentioned set of characters may be simplified and used with String::matches
:
// return true if expression contains only valid characters, false otherwise
private static boolean hasValidChars(String expr) {
return expr.matches("[- */0-9\\s(){}\\[\\]]*");
}
For the given set of characters, only square brackets [
, ]
need to be escaped while used inside character range, -
should not be escaped being the first character in the range.
If the regular expression should return true
if invalid character is detected, existing expression should be negated:
private static boolean hasInvalidChars(String expr) {
return expr.matches("(?![- */0-9\\s(){}\\[\\]]*$).*");
}
or the character set could be negated [^...]
(.*
need to be supplied to look for any invalid character in the expression, as matches
checks the entire string.)
private static boolean hasInvalidChars(String expr) {
return expr.matches(".*([^- */\\s0-9(){}\\[\\]]).*");
}
Tests:
for (String expr : Arrays.asList("(123 456) - 789", "abc 322", "(33 / [11 - x])")) {
System.out.println(expr);
System.out.println("invalid? " hasInvalidChars(expr));
System.out.println("valid? " hasValidChars(expr));
System.out.println("---");
}
Output:
(123 456) - 789
invalid? false
valid? true
---
abc 322
invalid? true
valid? false
---
(33 / [11 - x])
invalid? true
valid? false
---