Home > OS >  Is it possible to break up this statement without introducing ambiguity?
Is it possible to break up this statement without introducing ambiguity?

Time:08-21

I have the following grammar which works fine:

selectStatement
    : simpleSelectStatement (setOperand selectStatement)?;

However, I would like to break up the selectStatement so it tells us at the top level whether it contains a set operation at all. For example:

selectStatement
    : simpleSelectStatement | setOperation
    ;

setOperation
    : simpleSelectStatement (setOperand selectStatement)
    ;

Unfortunately, to parse this unambiguously, it has to examine the entire SELECT statement to see if there is a UNION there to see which rule to delegate to. For example, with the below taking 24 lookaheads to figure out what type of statement it is!

enter image description here

Is there a way to resolve this, or is the only way basically "Put it back into one root statement-type" (as the UNION usually comes 'so late in the statement' that delegating the statement type could almost take an entire parse itself). Here is a full working grammar to test with:

grammar DBParser;
options { caseInsensitive=true; }

root
    : selectStatement SEMI? EOF
    ;

selectStatement
    : simpleSelectStatement | setOperation
    ;

setOperation
    : simpleSelectStatement (setOperand selectStatement)
    ;

simpleSelectStatement:
    ( selectClause | OPEN_PAREN selectStatement CLOSE_PAREN)
    ;

selectClause
    : SELECT selectItem (COMMA selectItem)*
    ;
selectItem
    : NUMBER ( FROM IDENTIFIER )?
    ;

setOperand
    : UNION ALL?|EXCLUDE|INTERSECT
    ;

SELECT              :           'SELECT';                   // SELECT *...
LIMIT               :           'LIMIT';                    // ORDER BY x LIMIT 20
ALL                 :           'ALL';                      // SELECT ALL vs. SELECT DISTINCT; WHERE ALL (...); UNION ALL...
UNION               :           'UNION';                    // Set operation
FROM                :           'FROM';                    // Set operation
AS                  :           'AS';                      // Set operation
WITH                :           'WITH';                    // Set operation

SEMI                :           ';';                        // Statement terminator
OPEN_PAREN          :           '(';                        // Function calls, object declarations
CLOSE_PAREN         :           ')';
COMMA         :           ',';

NUMBER
     : [0-9] 
    ;
IDENTIFIER
    : [A-Z_] [A-Z_0-9]*
    ;
WHITESPACE
    : [ \t\r\n] -> skip
    ;

CodePudding user response:

Maybe just try labelled alternatives?

grammar DBParser;
options { caseInsensitive=true; }

root
    : selectStatement SEMI? EOF
    ;

selectStatement
    : SELECT selectItem (COMMA selectItem)* # simpleSelect
    | OPEN_PAREN selectStatement CLOSE_PAREN # parenSelect
    | selectStatement setOperand selectStatement # setOperation
    ;

//setOperation
//    : simpleSelectStatement setOperand selectStatement # set
//    ;

//simpleSelectStatement:
//     selectClause
//     | OPEN_PAREN selectStatement CLOSE_PAREN
//    ;

//selectClause
//    : SELECT selectItem (COMMA selectItem)*
//    ;
selectItem
    : NUMBER ( FROM IDENTIFIER )?
    ;

setOperand
    : UNION ALL?|EXCLUDE|INTERSECT
    ;

SELECT              :           'SELECT';                   // SELECT *...
LIMIT               :           'LIMIT';                    // ORDER BY x LIMIT 20
ALL                 :           'ALL';                      // SELECT ALL vs. SELECT DISTINCT; WHERE ALL (...); UNION ALL...
UNION               :           'UNION';                    // Set operation
FROM                :           'FROM';                    // Set operation
AS                  :           'AS';                      // Set operation
WITH                :           'WITH';                    // Set operation

SEMI                :           ';';                        // Statement terminator
OPEN_PAREN          :           '(';                        // Function calls, object declarations
CLOSE_PAREN         :           ')';
COMMA         :           ',';

NUMBER
     : [0-9] 
    ;
IDENTIFIER
    : [A-Z_] [A-Z_0-9]*
    ;
WHITESPACE
    : [ \t\r\n] -> skip
    ;

enter image description here

enter image description here

  • Related