Home > Blockchain >  Looking for a regex to match a specific arithmetic String in Java
Looking for a regex to match a specific arithmetic String in Java

Time:11-13

For a certain Java homework of mine I've been tasked with evaluating an arithmetic expression and deciding whether or not it is a valid expression. The expressions contain 3 types of brackets ( {},[],() ), digits,whitespace, and the -* and / operators. However, the strings could also contain random junk in which case I shouldnt parse the expression at all.

I have implemented the following regex, and hopefully it will work:

Pattern p = Pattern.compile("[^0-9\\[-\\]\\{-\\}\\(-\\) *\\/\\s-]");

However, I am not very familiar with regex and was wondering if anyone could take a second look?

CodePudding user response:

The suggested syntax isn't regular. The 'regular' in 'Regular Expression' isn't just an arbitrary word, nor is it the last name of its inventor. It's a description of a subset of all imaginable syntaxes.

If a syntax isn't regular, a regex can't parse it properly. Many, many things aren't regular. Basic math (pretty much everything with hierarchy/recursive elements) isn't.

You can't use regexes to parse this stuff.

It sounds like your plan is: "Let's first reject invalid inputs, and then I'll start thinking about where to go from there". This is the wrong approach, you've got it twisted around. Start parsing the input; if it's invalid you'll figure it out trivially as you process it.

The answer to 'parse e.g. 5 (2 * {3 * 10}) * [6 / 2]' does not involve regular expressions in any way.

note that \\{-\\} means: "All characters whose unicode values lie between the unicode value of the { char and the unicode value of the } char are valid", which you didn't mean there - you wanted just \\{\\} (as in: The { character and the } character). Once you fix this, your regex will correctly 'detect' any characters that are flat out invalid, such as the letter 'a'. However, it will allow a great many things that aren't valid, such as:

  • (
  • {} * ()
  • 5 2
  • ---5555---
  • ///
  • (5 2}
  • 99999999999999999999999999 (that's more than fits in int)
  • 5 2
  • {(5 2) * (3 [5 2)}

you can't write a regexp that properly denies all these.

CodePudding user response:

The regular expression to validate just the mentioned set of characters may be simplified and used with String::matches:

// return true if expression contains only valid characters, false otherwise
private static boolean hasValidChars(String expr) {
    return expr.matches("[- */0-9\\s(){}\\[\\]]*");
}

For the given set of characters, only square brackets [, ]need to be escaped while used inside character range, - should not be escaped being the first character in the range.


If the regular expression should return true if invalid character is detected, existing expression should be negated:

private static boolean hasInvalidChars(String expr) {
    return expr.matches("(?![- */0-9\\s(){}\\[\\]]*$).*");
}

or the character set could be negated [^...] (.* need to be supplied to look for any invalid character in the expression, as matches checks the entire string.)

private static boolean hasInvalidChars(String expr) {
    return expr.matches(".*([^- */\\s0-9(){}\\[\\]]).*");
}

Tests:

for (String expr : Arrays.asList("(123   456) - 789", "abc   322", "(33 / [11 - x])")) {
    System.out.println(expr);
    System.out.println("invalid? "   hasInvalidChars(expr));
    System.out.println("valid? "   hasValidChars(expr));
    System.out.println("---");
}

Output:

(123   456) - 789
invalid? false
valid? true
---
abc   322
invalid? true
valid? false
---
(33 / [11 - x])
invalid? true
valid? false
---
  • Related