Home > Back-end >  Antlr4: Make a restriction for XML tags to have the same value
Antlr4: Make a restriction for XML tags to have the same value

Time:12-06

Currently the XML grammar doesn't have a restriction for XML tag to be the same:

element     :   '<' Name attribute* '>' content '<' '/' Name '>'

So it will perfectly match <boo>text</bar>

Is Antlr4 grammar itself not the right place to restrict the "Name" to be the same on both sides of "content"? Then the right way to do it is to use listener/visitor to report that sort of inconsistency?

CodePudding user response:

Yes, you are correct: in a grammar you specify the rules so that it is conforming the formal grammar of the (programming) language. In the stage after parsing, you do semantic checks to see if the created parse tree is correct conforming the rules of the (programming) language. In ANTLR, you can indeed use a visitor or listener to perform such semantic checks.

CodePudding user response:

Names matching is a semantic concern. ANTLR is concerned with Syntax and accurately representing the structure of you source document.

Rather than capturing this in the grammar, you could just write a Listener that overrides the enterElement() (or exitElement() if you prefer) method and checks that the two Names are equivalent. You may find that, taking into namespaces into account it’s possible it’s not even as simple as a simple string compare (I’d have to check the spec to verify the rules on that). If you detect a mismatch, you have the luxury of wording your own specific error message detailing how they don’t match.

While you could add a semantic predicate to your grammar, that prevents ANTLR from recognizing it as an element (It clearly is an element, so there’s a good argument that this would make the grammar “wrong”). This would then cause ANTLR to continue looking for a matching rule, and fail with a rather unintuitive error message.

It’s very tempting to try and make your grammar be a processable full specification of your language. This often creates many more problems than it solves.

  • Related