I've been tasked with creating a grammar for a legacy DSL that's been in use for over 20 years. The original parser was written using a mess of regular expressions, so I've been told.
The syntax is generally of the "if this variable is n then set that variable to m" style.
My grammar works for almost all cases, but there are a few places where it baulks because of a (mis)use of the &&
(logical and) operator.
My Lark grammar (which is LALR(1)) is:
?start: statement*
?statement: expression ";"
?expression : assignment_expression
?assignment_expression : conditional_expression
| primary_expression assignment_op assignment_expression
?conditional_expression : logical_or_expression
| logical_or_expression "?" expression (":" expression)?
?logical_or_expression : logical_and_expression
| logical_or_expression "||" logical_and_expression
?logical_and_expression : equality_expression
| logical_and_expression "&&" equality_expression
?equality_expression : relational_expression
| equality_expression equals_op relational_expression
| equality_expression not_equals_op relational_expression
?relational_expression : additive_expression
| relational_expression less_than_op additive_expression
| relational_expression greater_than_op additive_expression
| relational_expression less_than_eq_op additive_expression
| relational_expression greater_than_eq_op additive_expression
?additive_expression : multiplicative_expression
| additive_expression add_op multiplicative_expression
| additive_expression sub_op multiplicative_expression
?multiplicative_expression : primary_expression
| multiplicative_expression mul_op primary_expression
| multiplicative_expression div_op primary_expression
| multiplicative_expression mod_op primary_expression
?primary_expression : variable
| variable "[" INT "]" -> array_accessor
| ESCAPED_STRING
| NUMBER
| unary_op expression
| invoke_expression
| "(" expression ")"
invoke_expression : ID ("." ID)* "(" argument_list? ")"
argument_list : expression ("," expression)*
unary_op : "-" -> negate_op
| "!" -> invert_op
assignment_op : "="
add_op : " "
sub_op : "-"
mul_op : "*"
div_op : "/"
mod_op : "%"
equals_op : "=="
not_equals_op : "!="
greater_than_op : ">"
greater_than_eq_op : ">="
less_than_op : "<"
less_than_eq_op : "<="
ID : CNAME | CNAME "%%" CNAME
?variable : ID
| ID "@" ID -> namelist_id
| ID "@" ID "@" ID -> exptype_id
| "$" ID -> environment_id
%import common.WS
%import common.ESCAPED_STRING
%import common.CNAME
%import common.INT
%import common.NUMBER
%import common.CPP_COMMENT
%ignore WS
%ignore CPP_COMMENT
And some working examples are:
(a == 2) ? (c = 12);
(a == 2 && b == 3) ? (c = 12);
(a == 2 && b == 3) ? (c = 12) : d = 13;
(a == 2 && b == 3) ? ((c = 12) && (d = 13));
But there are a few places where I see this construct:
(a == 2 && b == 3) ? (c = 12 && d = 13);
That is, the two assignments are joined by &&
but aren't in parentheses and it doesn't like the second assignment operator. I assume this is because it's trying to parse it as (c = (12 && d) = 13)
I've tried changing the order of the rules (this is my first non-toy DSL, so there's been a lot of trial and error), but I either get similar errors or the precedence is wrong. And the Earley algorithm doesn't fix it.
CodePudding user response:
Instead of:
?assignment_expression : conditional_expression
| primary_expression assignment_op assignment_expression
?conditional_expression : logical_or_expression
| logical_or_expression "?" expression (":" expression)?
?logical_or_expression : logical_and_expression
| logical_or_expression "||" logical_and_expression
?logical_and_expression : equality_expression
| logical_and_expression "&&" equality_expression
?equality_expression : relational_expression
| equality_expression equals_op relational_expression
| equality_expression not_equals_op relational_expression
?relational_expression : additive_expression
| relational_expression less_than_op additive_expression
| relational_expression greater_than_op additive_expression
| relational_expression less_than_eq_op additive_expression
| relational_expression greater_than_eq_op additive_expression
?additive_expression : multiplicative_expression
| additive_expression add_op multiplicative_expression
| additive_expression sub_op multiplicative_expression
?multiplicative_expression : primary_expression
| multiplicative_expression mul_op primary_expression
| multiplicative_expression div_op primary_expression
| multiplicative_expression mod_op primary_expression
try:
?assignment_expression : conditional_expression
| primary_expression assignment_op expression
?conditional_expression : logical_or_expression
| logical_or_expression "?" expression (":" expression)?
?logical_or_expression : logical_and_expression
| logical_or_expression "||" expression
?logical_and_expression : equality_expression
| logical_and_expression "&&" expression
?equality_expression : relational_expression
| equality_expression equals_op expression
| equality_expression not_equals_op expression
?relational_expression : additive_expression
| relational_expression less_than_op expression
| relational_expression greater_than_op expression
| relational_expression less_than_eq_op expression
| relational_expression greater_than_eq_op expression
?additive_expression : multiplicative_expression
| additive_expression add_op expression
| additive_expression sub_op expression
?multiplicative_expression : primary_expression
| multiplicative_expression mul_op expression
| multiplicative_expression div_op expression
| multiplicative_expression mod_op expression
CodePudding user response:
Thanks for all the help, but as of this morning the customer and I agreed that the offending lines of code will be fixed, rather than torturing the grammar to make them work. There's only 9 out of 3300 lines of code that are ambiguous, so the extra effort and hackiness wasn't worth it.