Home > database >  How can I fix my DSL grammar to parse a problem statement?
How can I fix my DSL grammar to parse a problem statement?

Time:01-09

I've been tasked with creating a grammar for a legacy DSL that's been in use for over 20 years. The original parser was written using a mess of regular expressions, so I've been told.

The syntax is generally of the "if this variable is n then set that variable to m" style.

My grammar works for almost all cases, but there are a few places where it baulks because of a (mis)use of the && (logical and) operator.

My Lark grammar (which is LALR(1)) is:

?start: statement*

?statement: expression ";"

?expression : assignment_expression

?assignment_expression : conditional_expression
                       | primary_expression assignment_op assignment_expression

?conditional_expression : logical_or_expression
                        | logical_or_expression "?" expression (":" expression)?

?logical_or_expression : logical_and_expression
                       | logical_or_expression "||" logical_and_expression

?logical_and_expression : equality_expression
                        | logical_and_expression "&&" equality_expression

?equality_expression : relational_expression
                     | equality_expression equals_op relational_expression
                     | equality_expression not_equals_op relational_expression

?relational_expression : additive_expression
                       | relational_expression less_than_op additive_expression
                       | relational_expression greater_than_op additive_expression
                       | relational_expression less_than_eq_op additive_expression
                       | relational_expression greater_than_eq_op additive_expression

?additive_expression : multiplicative_expression
                     | additive_expression add_op multiplicative_expression
                     | additive_expression sub_op multiplicative_expression

?multiplicative_expression : primary_expression
                           | multiplicative_expression mul_op primary_expression
                           | multiplicative_expression div_op primary_expression
                           | multiplicative_expression mod_op primary_expression

?primary_expression : variable
                    | variable "[" INT "]"    -> array_accessor
                    | ESCAPED_STRING
                    | NUMBER
                    | unary_op expression
                    | invoke_expression
                    | "(" expression ")"

invoke_expression : ID ("." ID)* "(" argument_list? ")"
argument_list : expression ("," expression)*

unary_op : "-" -> negate_op
         | "!" -> invert_op
assignment_op : "="
add_op : " "
sub_op : "-"
mul_op : "*"
div_op : "/"
mod_op : "%"
equals_op : "=="
not_equals_op : "!="
greater_than_op : ">"
greater_than_eq_op : ">="
less_than_op : "<"
less_than_eq_op : "<="

ID : CNAME | CNAME "%%" CNAME

?variable : ID
    | ID "@" ID           -> namelist_id
    | ID "@" ID "@" ID    -> exptype_id
    | "$" ID              -> environment_id

%import common.WS
%import common.ESCAPED_STRING
%import common.CNAME
%import common.INT
%import common.NUMBER
%import common.CPP_COMMENT

%ignore WS
%ignore CPP_COMMENT

And some working examples are:

(a == 2) ? (c = 12);
(a == 2 && b == 3) ? (c = 12);
(a == 2 && b == 3) ? (c = 12) : d = 13;
(a == 2 && b == 3) ? ((c = 12) && (d = 13));

But there are a few places where I see this construct:

(a == 2 && b == 3) ? (c = 12 && d = 13);

That is, the two assignments are joined by && but aren't in parentheses and it doesn't like the second assignment operator. I assume this is because it's trying to parse it as (c = (12 && d) = 13)

I've tried changing the order of the rules (this is my first non-toy DSL, so there's been a lot of trial and error), but I either get similar errors or the precedence is wrong. And the Earley algorithm doesn't fix it.

CodePudding user response:

Instead of:

?assignment_expression : conditional_expression
                       | primary_expression assignment_op assignment_expression

?conditional_expression : logical_or_expression
                        | logical_or_expression "?" expression (":" expression)?

?logical_or_expression : logical_and_expression
                       | logical_or_expression "||" logical_and_expression

?logical_and_expression : equality_expression
                        | logical_and_expression "&&" equality_expression

?equality_expression : relational_expression
                     | equality_expression equals_op relational_expression
                     | equality_expression not_equals_op relational_expression

?relational_expression : additive_expression
                       | relational_expression less_than_op additive_expression
                       | relational_expression greater_than_op additive_expression
                       | relational_expression less_than_eq_op additive_expression
                       | relational_expression greater_than_eq_op additive_expression

?additive_expression : multiplicative_expression
                     | additive_expression add_op multiplicative_expression
                     | additive_expression sub_op multiplicative_expression

?multiplicative_expression : primary_expression
                           | multiplicative_expression mul_op primary_expression
                           | multiplicative_expression div_op primary_expression
                           | multiplicative_expression mod_op primary_expression

try:

?assignment_expression : conditional_expression
                       | primary_expression assignment_op expression

?conditional_expression : logical_or_expression
                        | logical_or_expression "?" expression (":" expression)?

?logical_or_expression : logical_and_expression
                       | logical_or_expression "||" expression

?logical_and_expression : equality_expression
                        | logical_and_expression "&&" expression

?equality_expression : relational_expression
                     | equality_expression equals_op expression
                     | equality_expression not_equals_op expression

?relational_expression : additive_expression
                       | relational_expression less_than_op expression
                       | relational_expression greater_than_op expression
                       | relational_expression less_than_eq_op expression
                       | relational_expression greater_than_eq_op expression

?additive_expression : multiplicative_expression
                     | additive_expression add_op expression
                     | additive_expression sub_op expression

?multiplicative_expression : primary_expression
                           | multiplicative_expression mul_op expression
                           | multiplicative_expression div_op expression
                           | multiplicative_expression mod_op expression

CodePudding user response:

Thanks for all the help, but as of this morning the customer and I agreed that the offending lines of code will be fixed, rather than torturing the grammar to make them work. There's only 9 out of 3300 lines of code that are ambiguous, so the extra effort and hackiness wasn't worth it.

  • Related