Home > Back-end >  Can I force Antlr to parse a syntactically invalid program and return a parse tree?
Can I force Antlr to parse a syntactically invalid program and return a parse tree?

Time:12-13

I'm working on a project that need to parse various syntactivally invalid programs, which are generated by randomly inserting a consecutive token list to the seed program, or deleting some tokens from it. I want to parse such invalid programs into incomplete parse trees. Take the following code snippet as an example:

{
 printf("hello");
 int 
}

There is no identifier after int.

Can I force Antlr to parse it into a partially correct tree like this?

- code snippet
  - LeftBrace             {
  - ExpressionStatement   printf("hello");
  - unknown node          int
  - RightBrace            }

Another example:

    {
     printf("hello");
    }(

There is a redundant ( after the statement. Here is what I want:

 - code snippet
  - LeftBrace             {
  - ExpressionStatement   printf("hello");
  - RightBrace            }
  - unknown node          (

CodePudding user response:

Depending upon the degree to which you want to recognize "bad" input, ANTLR does this by default.

One of ANTLR's features is it's error recovery process, and it will do just this sort of token insertion and token ignoring in an attempt to parse your input. The DefaultErrorStrategy will ignore or insert single tokens in an effort to recover parsing. If that doesn't work, it will consume tokens (ignoring them) until it finds a valid "next token" and continue processing. It should be clear, that the worse the input deviates from valid input, the less we should expect from error recovery.

It will, of course, recognize errors at those points of the input, but I would assume you want to retain that behavior. Of course, you can put your own ErrorHandler in place and override that behavior.

You can also provide your own implementation of ANTLRErrorStrategy. (Possibly extending the DefaultErrorStrategy that is already very good and a focus of much attention in ANTLRs development.)

Error recovery is covered pretty extensively in The Definitive ANTLR 4 Reference. If you're going to do much in-depth with ANLTR (as you question implies), I'd suggest this book is pretty much "mandatory reading"

  • Related