I am able to parse the following valid SQL expression:
(select 1 limit 1) union all select 1 union all (select 2);
However, it seems there is an ambiguity in it, which I've been have a lot of trouble resolving. Here is a working version of the program (I've obviously cut down the statements just to create a minimal producible question) --
parser grammar DBParser;
options { tokenVocab = DBLexer;}
root
: selectStatement SEMI? EOF
;
selectStatement
: (selectStatementWithParens|selectClause|setOperation)
limitClause?
;
selectClause
: SELECT NUMBER
;
limitClause
: LIMIT NUMBER
;
selectStatementWithParens
: OPEN_PAREN selectStatement CLOSE_PAREN
;
setOperation:
(selectClause | selectStatementWithParens)
(setOperand (selectClause | selectStatementWithParens))*
;
setOperand
: UNION ALL?
;
lexer grammar DBLexer;
options { caseInsensitive=true; }
SELECT : 'SELECT'; // SELECT *...
LIMIT : 'LIMIT'; // ORDER BY x LIMIT 20
ALL : 'ALL'; // SELECT ALL vs. SELECT DISTINCT; WHERE ALL (...); UNION ALL...
UNION : 'UNION'; // Set operation
SEMI : ';'; // Statement terminator
OPEN_PAREN : '('; // Function calls, object declarations
CLOSE_PAREN : ')';
NUMBER
: [0-9]
;
WHITESPACE
: [ \t\r\n] -> skip
;
Where is the ambiguity coming from, and what could be a possible way to solve this?
Update: I'm not sure if this is the solution, but it seems the following helped eliminate that ambiguity:
selectStatement:
withClause?
(selectStatementWithParens|selectClause)
(setOperand (selectClause|selectStatementWithParens))*
orderClause?
(limitClause offsetClause?)?
;
In other words -- making it such that the setOperand
doesn't re-start with the select
.
CodePudding user response:
your setOpertaion
rule can match single selectClause
or selectStatementWithParens
(because you use the *
cardinality for the second half of the rule, so 0 instances of the second half still matches the rule). This means that a selectClause
can match the selectClause
rule in selectStatement
, or it could be used to construct a setOperation
(which is the other alternative in your ambiguity).
If you change setOperation
to use
cardinality for the second half of the rule, you resolve the ambiguity.
setOperation
: (selectClause | selectStatementWithParens)
(setOperand (selectClause | selectStatementWithParens))
;
This also seems logical, that you'd only want to consider something a setOperation
if there's a setOperand
involved.
That explains and corrects the ambiguity, but still leaves you with a "max k" of 7.