Let's say I want to have a file with a bunch of arithmetic statement, such as:
1 1
2 2; 2 4;
;
5 5
5 7
And statements are either separated by a ;
or \n
, and we don't care about empty statements (such as if we have multiple consecutive newlines or ;
or whatever).
What would be the proper way to deal with this? Here was my attempt at doing it:
grammar Calc;
program
: expressions
;
expressions
: expression (SEPARATOR expression)* SEPARATOR*
;
expression
: '(' expression ')' // parenExpression has highest precedence
| expression MULDIV expression // then multDivExpression
| expression ADDSUB expression // then addSubExpression
| OPERAND // finally the operand itself
;
MULDIV
: [*/]
;
ADDSUB
: [- ]
;
// 12 or .12 or 2. or 2.38
OPERAND
: [0-9] ('.' [0-9]*)?
| '.' [0-9]
;
SEPARATOR
: [\n;]
;
I think maybe the way that I handle SEPARATOR
is a bit of a hack, maybe a better way would be to say that a statement can be empty or it can be non-empty, or something along those lines? What might be a more clear grammar to describe what I want? Additionally, is expression (SEPARATOR expression)* SEPARATOR*
a poor way to accomplish this, or does that seem like a valid parser rule for expressions:
?
CodePudding user response:
Let me suggest a different approach: don't handle separators in the grammar, but instead split your input into single expressions before you actually parse them. This will simplify your grammar and you can easily tell if an expression is actually empty.
const expressions = input.split("\n").split(";");
expressions.forEach((expression) = {
if (expression.length > 0) {
myParser.expression(expression);
}
});
With that loop you can even stop parsing a long expression list if you find an error. In general that's much more flexible.