Home > Enterprise >  How to break up a grammar file per statement
How to break up a grammar file per statement

Time:08-13

I have the following single-file grammar which will parse a (fake) SELECT statement or ALTER statement:

grammar Calc;

statements
    : statement (NEWLINE statement)*
    ;

statement
    : select_statement
    | alter_statement
    ;

select_statement
    : SELECT IDENTIFIER
    ;

alter_statement
    : ALTER IDENTIFIER
    ;

NEWLINE: '\n';
ALTER: 'ALTER';
SELECT: 'SELECT';
IDENTIFIER: 'one' | 'two' | 'three';
WS: [ \t\n\r]  -> skip; // We put it at the end and ONLY capture single white-spaces,
                        // so it's always overriden if anything else is provided above it, such as the NEWLINE

And the test input:

SELECT one
ALTER two

I would like to separate this grammar into separate parser files for each statement. This can be done in two steps. In the first part I'll separate the parser and lexer:

parser grammar SQLParser;
options { tokenVocab = SQLLexer; }

program: statements EOF;
statements: statement (NEWLINE statement)*;
statement
    : select_statement
    | alter_statement
    ;

select_statement: SELECT IDENTIFIER;
alter_statement: ALTER IDENTIFIER;
lexer grammar SQLLexer;
options { caseInsensitive=true; }

NEWLINE: '\n';
ALTER: 'ALTER';
SELECT: 'SELECT';
IDENTIFIER: 'one' | 'two' | 'three';
WS: [ \t\n\r]  -> skip;it,

How would I then separate the two statements into their own separate files, such as SQLSelectParser.g4 and SQLAlterParser.g4 ? Are there any downsides of breaking up multiple complex statements into their own file? (of course, in the example it's trivial but it's just to ask this question).


Update, it seems the following approach works fine, though it'd be good to get someone experienced to comment on the approach and if it's even a good idea to do it at all:

# SQLParser.g4
parser grammar SQLParser;
import SQLAlterParser, SQLSelectParser;
options { tokenVocab = SQLLexer; }

program
    : statements EOF
    ;

statements
    : statement (NEWLINE statement)*
    ;

statement
    : select_statement
    | alter_statement
    ;
# SQLSelectParser.g4
parser grammar SQLSelectParser;
options { tokenVocab = SQLLexer; }

select_statement
    : SELECT IDENTIFIER
    ;
# SQLAlterParser.g4
parser grammar SQLAlterParser;
options { tokenVocab = SQLLexer; }

alter_statement
    : ALTER IDENTIFIER
    ;
# SQLLexer.g4
lexer grammar SQLLexer;
options { caseInsensitive=true; }

NEWLINE: '\n';
ALTER: 'ALTER';
SELECT: 'SELECT';
IDENTIFIER: 'one' | 'two' | 'three';
WS: [ \t\n\r]  -> skip;

CodePudding user response:

ANTLR import is not a simple include. Each grammar is treated as its own grammar "object".

When importing you get more of a superclass behavior as explained in the Grammar Imports documentation.

You may find that it gets tricky to keep everything straight (especially as you find common sub-rules).

As the documentation shows, it can definitely be useful.

You didn't elaborate on your motivation for breaking things up. If it's just to break things up into smaller source files for organizational purposes, you may be incurring more complexity from the way imports work than you deal with by having the grammar in a single file (or just breaking up Lexer and Parser, which is very common).

  • Related