Home > Software engineering >  How to resolve this grammar ambiguity?
How to resolve this grammar ambiguity?

Time:08-19

I am able to parse the following valid SQL expression:

(select 1 limit 1) union all select 1 union all (select 2);

However, it seems there is an ambiguity in it, which I've been have a lot of trouble resolving. Here is a working version of the program (I've obviously cut down the statements just to create a minimal producible question) --

parser grammar DBParser;
options { tokenVocab = DBLexer;}

root
    : selectStatement SEMI? EOF
    ;

selectStatement
     : (selectStatementWithParens|selectClause|setOperation)
     limitClause?
     ;

selectClause
    : SELECT NUMBER
    ;

limitClause
    : LIMIT NUMBER
    ;
 
selectStatementWithParens
    : OPEN_PAREN selectStatement CLOSE_PAREN
    ;

setOperation:
    (selectClause | selectStatementWithParens)
    (setOperand (selectClause | selectStatementWithParens))*
    ;

setOperand
    : UNION ALL?
    ;
lexer grammar DBLexer;
options { caseInsensitive=true; }
SELECT              :           'SELECT';                   // SELECT *...
LIMIT               :           'LIMIT';                    // ORDER BY x LIMIT 20
ALL                 :           'ALL';                      // SELECT ALL vs. SELECT DISTINCT; WHERE ALL (...); UNION ALL...
UNION               :           'UNION';                    // Set operation

SEMI                :           ';';                        // Statement terminator
OPEN_PAREN          :           '(';                        // Function calls, object declarations
CLOSE_PAREN         :           ')';

NUMBER
     : [0-9] 
    ;

WHITESPACE
    : [ \t\r\n] -> skip
    ;

Where is the ambiguity coming from, and what could be a possible way to solve this?


Update: I'm not sure if this is the solution, but it seems the following helped eliminate that ambiguity:

 selectStatement:
     withClause?
     (selectStatementWithParens|selectClause)
     (setOperand (selectClause|selectStatementWithParens))*
     orderClause?
     (limitClause offsetClause?)?
     ;

In other words -- making it such that the setOperand doesn't re-start with the select.

CodePudding user response:

your setOpertaion rule can match single selectClause or selectStatementWithParens (because you use the * cardinality for the second half of the rule, so 0 instances of the second half still matches the rule). This means that a selectClause can match the selectClause rule in selectStatement, or it could be used to construct a setOperation (which is the other alternative in your ambiguity).

If you change setOperation to use cardinality for the second half of the rule, you resolve the ambiguity.

setOperation
    : (selectClause | selectStatementWithParens)
      (setOperand (selectClause | selectStatementWithParens)) 
    ;

This also seems logical, that you'd only want to consider something a setOperation if there's a setOperand involved.

That explains and corrects the ambiguity, but still leaves you with a "max k" of 7.

  • Related