Home > Enterprise >  Remove ambiguity in grammar for expression casting
Remove ambiguity in grammar for expression casting

Time:10-15

I'm working on a small translator in JISON, but I've run into a problem when trying to implement the cast of expressions, since it generates an ambiguity in the grammar when trying to add the production of cast. I need to add the productions to the cast option, so in principle I should have something like this:

expr: OPEN_PAREN type CLOSE_PAREN expr

However, since in my grammar I must be able to have expressions in parentheses, I already have the following production, so the grammar is now ambiguous:

expr: '(' expr ')'

Initially I had the following grammar for expressions:

expr                :       expr PLUS expr
                    |       expr MINUS expr
                    |       expr TIMESexpr
                    |       expr DIV expr
                    |       expr MOD expr
                    |       expr POWER expr
                    |       MINUS expr %prec UMINUS
                    |       expr LESS_THAN expr
                    |       expr GREATER_THAN expr
                    |       expr LESS_OR_EQUAL expr
                    |       expr GREATER_OR_EQUAL expr
                    |       expr EQUALS expr
                    |       expr DIFFERENT expr
                    |       expr OR expr
                    |       expr AND expr
                    |       NOT expr
                    |       OPEN_PAREN expr CLOSE_PAREN
                    |       INT_LITERAL
                    |       DOUBLE_LITERAL
                    |       BOOLEAN_LITERAL
                    |       CHAR_LITERAL
                    |       STRING_LTIERAL
                    |       ID;

Ambiguity was handled by applying the following precedence and associativity rules:

%left                       'ASSIGNEMENT'
%left                       'OR'
%left                       'AND'
%left                       'XOR'
%left                       'EQUALS', 'DIFFERENT'
%left                       'LESS_THAN ', 'GREATER_THAN ', 'LESS_OR_EQUAL ',  'GREATER_OR_EQUAL '
%left                       'PLUS', 'MINUS'
%left                       'TIMES', 'DIV', 'MOD'
%right                      'POWER'
%right                      'UMINUS', 'NOT'

I can't find a way to write a production that allows me to add the cast without falling into an ambiguity. Is there a way to modify this grammar without having to write an unambiguous grammar? Is there a way I can resolve this issue using JISON, which I may not have been able to see?

Any ideas are welcome.

This is what I was trying, however it's still ambiguous:

expr: OPEN_PAREN type CLOSE_PAREN expr
    | OPEN_PAREN expr CLOSE_PAREN

CodePudding user response:

Usually the way this problem is handled is by having type names as distinct keywords that can't be expressions by themselves. That way, after seeing an (, the next token being a type means it is a cast and the next token being an identifier means it is an expression, so there is no ambiguity.

However, your grammar appears to allow type names (INT, DOUBLE, etc) as expressions. This doesn't make a lot of sense, and causes your parsing problem, as differentiating between a cast and a parenthesized expression will require more lookahead.

The easiest fix would be to remove these productions (though you should still have something like expr : CONSTANT_LITERAL for literal constants)

CodePudding user response:

The problem is that you don't specify the precedence of the cast operator, which is effectively a unary operator whose precedence should be the same as any other unary operator, such as NOT. (See below for a discussion of UMINUS.)

The parsing conflicts you received are not related to the fact that expr: '(' expr ')' is also a production. That would prevent LL(1) parsing, because the two productions start with the same sequence, but that's not an ambiguity. It doesn't affect bottom-up parsing in any way; the two productions are unambiguously recognisable.

Rather, the conflicts are the result of the parser not knowing whether (type)a b means ((type)a b or (type)(a b), which is no different from the ambiguity of unary minus (should -a/b be parsed as (-a)/b or -(a/b)?), which is resolved by putting UMINUS at the end of the precedence list.

In the case of casts, you don't need to use a %prec declaration with a pseudo-token; that's only necessary for - because - could also be a binary operator, with a different (reduction) precedence. The precedence of the production:

expr: '(' type ')' expr

is ) (at least in yacc/bison), because that's the last terminal in the production. There's no need to give ) a shift precedence, because the grammar requires it to always be shifted.

Three notes:

  1. Assignment is right-associative. a = b = 3 means a = (b = 3), not (a = b) = 3.

  2. In the particular case of unary minus (and, by extension, unary plus if you feel like implementing it), there's a good argument for putting it ahead of exponentiation, so that -a**b is parsed as -(a**b). But that doesn't mean you should move other unary operators up from the end; (type)a**b should be parsed as ((type)a)**b. Nothing says that all unary operators have to have the same precedence.

  3. When you add postfix operators -- notably function calls and array subscripts -- you will want to put them after the unary prefix operators. -a[3] most certainly does not mean (-a)[3]. These postfix operators are, in a way, duals of the prefix operators. As noted above, expr: '(' type ')' expr has precedence ')', which is only used as a reduction precedence. Conversely, expr: expr '(' expr-list ')' does not require a reduction precedence; the relevant token whose shift precedence needs to be declared is (.

So, according to all the above, your precedence declarations might be:

%right          ASSIGNMENT
%left           OR
%left           AND
%left           XOR
%left           EQUALS DIFFERENT
%left           LESS_THAN GREATER_THAN LESS_OR_EQUAL GREATER_OR_EQUAL 
%left           PLUS MINUS
%left           TIMES DIV MOD
%precedence     UMINUS
%right          POWER
%precedence     NOT CLOSE_PAREN
%precedence     OPEN_PAREN OPEN_BRACKET

%precedence in Bison is used to declare precedence levels for operators which have no associativity (in other words, unary operators, whether prefix or postfix). (That's not the same as saying that the operator cannot be used associatively, as is often required for comparison operators.) If you're not using Bison, you can use either %left or %right indifferently for unary operators.

  • Related