Labelling a statement in antlr -- Performance and Size costs-CodePudding

Is there a suggested practice on whether to label large alternate rules or not?

I thought that it would basically be a nice thing that you could get "for free", but in some very basic tests of it with two basic files that parse expressions -- one with and one without expressions -- the performance is a toss up, and the size is almost 50% larger.

Here are the two grammars I used for testing:

grammar YesLabels;
root: (expr ';')* EOF;

expr
    : '(' expr ')'      # ParenExpr
    | '-' expr          # UMinusExpr
    | ' ' expr          # UPlusExpr
    | expr '*' expr     # TimesExpr
    | expr '/' expr     # DivExpr
    | expr '-' expr     # MinusExpr
    | expr ' ' expr     # PlusExpr
    | Atom              # AtomExpr
    ;

Atom:
    [a-z]  | [0-9]  | '\'' Atom '\''
    ;

WHITESPACE: [ \t\r\n] -> skip;

grammar NoLabels;
root: (expr ';')* EOF;

expr
    : '(' expr ')'
    | '-' expr
    | ' ' expr
    | expr '*' expr
    | expr '/' expr
    | expr '-' expr
    | expr ' ' expr
    | Atom
    ;

Atom:
    [a-z]  | [0-9]  | '\'' Atom '\''
    ;

WHITESPACE: [ \t\r\n] -> skip;

Testing this on the following expression (repeated 100k times, ~ 2MB file):

1 1-2--(2 3/2-5) 4;

I get the following timings and size:

$ ant YesLabels root ../tests/expr.txt

real    0m1.096s # varies quite a bit, sometimes larger than the other
116K    ./out

$ ant NoLabels root ../tests/expr.txt

real    0m0.821s # varies quite a bit, sometimes smaller than the other
80K ./out

CodePudding user response：

So when you use labelled alternatives, you'll get more classes. This is (IMHO) the advantage, it makes listeners easier to target and each class is simpler with properties that apply only to that alternative.

While that means the executable will be larger, with a meaningful sized parse tree, the memory savings of the targeted class instances will probably make up for the difference in executable size. (in your example, the difference is 46K, that's NOTHING memory wise to the memory used by the parse tree, token stream, etc. of your actual running program.

Your sample input is probably not big enough to really show any difference in performance, but, as mentioned elsewhere, you should first focus on a usable parse tree and then address size/performance should you determine that it's actually an issue.

Not everyone agrees, but in my opinion, the benefits of the labelled alternatives are substantial, and the space/performance needs are negligible (if even measurable)