Home > Software engineering >  How to define a SQL set operation in a grammar?
How to define a SQL set operation in a grammar?

Time:08-19

I am having a tough time figuring out a way to handle the SQL set operation, that it, being able to do things like query1 UNION ALL query2. It should support the following statements:

select 1 union all select 2                                 -- ok
with cte as (select 1) select 1 from cte union all select 2 -- ok
(select 1 limit 1) union all select 1 union all (select 2)  -- ok

And should not support the following, as one of the table operands to the query expression needs to be parenthesized:

select 1 from cte union all with cte as (select 1)          -- no
select 1 limit 1 union select 1                             -- no

Here is the grammar I have so far:

 selectStatement:
     withClause?
     (selectStatementWithParens|selectClause|setOperation)
     orderClause?
     (limitClause offsetClause?)?
     ;

selectStatementWithParens
    : OPEN_PAREN selectStatement CLOSE_PAREN
    ;

selectClause:
    SELECT // a bunch of other stuff...
    ;

setOperation:
    (selectClause | selectStatementWithParens)
    (setOperand (selectClause | selectStatementWithParens))*
    ;

setOperand
    : UNION (DISTINCT|ALL)?
    | INTERSECT DISTINCT?
    | EXCEPT DISTINCT?
    ;

Though it does work -- it seems to be a bit heavy handed, and I don't like having the selectStatementWithParens underneath the WITH (though it does work, such the WITH is optional, and so it can support (select 1) or (with x as (select 1) select 1)). How could this be improved, or does it already look pretty good?

CodePudding user response:

Consider this:

parser grammar DBParser
    ;
options {
    tokenVocab = DBLexer;
}

root: selectStatement SEMI? EOF;

selectStatement
    : selectClause # selectClauseStatement
    | selectStatement limitClause # limitedSelectStatement
    | OPEN_PAREN selectStatement CLOSE_PAREN # parenSelectStatement
    | selectStatement setOperand selectStatement # setOperationStatement
    ;

selectClause: SELECT NUMBER;

limitClause: LIMIT NUMBER;

setOperand: UNION ALL?;

It gives a usable parse tree (so far as I can tell) and has max K = 1

enter image description here

I may have missed some subtlety, but perhaps it puts you on a better path.

  • Related