Home > Software design >  Can I define the XML syntax using an operator precedence grammar?
Can I define the XML syntax using an operator precedence grammar?

Time:09-09

Let's focus on the following parts with the following assumptions,

  • any identifier having an Uppercase initial character is a Terminal (Misc, CharData, Reference, CDSect, PI, Comment)
  • otherwise (lowercase initial character), a nonterminal (document, prolog, element)
[1]  document      ::=      prolog element Misc*
[39] element       ::=      STag content ETag
[43] content       ::=      CharData? ((element | Reference | CDSect | PI | Comment) CharData?)*

I want to write it into an operator precedence grammar. But I failed to complete the rule content. How can I define it?

document : prolog element
         | prolog element misc
         ;
misc     : misc Misc
         | Misc
         ;
element  : STag ETag
         | STag content ETage
         ;

CodePudding user response:

That grammar is not an operator grammar, so attempting to write an operator precedence parser for it is bound to fail. If you really want to pursue that project, you'll need to rewrite the grammar.

Recall that there are two essential features of an operator grammar:

  • Every production includes at least one operator (terminal).
  • No production includes two consecutive non-terminals.

The first rule prohibits empty and unit productions. Those can be mechanically eliminated at the cost of bloating the grammar.

The second rule prohibits right-hand sides like document: prolog element. But more critically, it won't let you use element as a non-terminal because the language itself permits juxtaposed elements. That modification should be possible, since every element in fact starts and ends with a terminal, so you should be able to eliminate element from the grammar by macro-replacing all of its uses with the definitions. But it's also going to be tedious.(Also, I'm not convinced that making STag and ETag terminals really reflects the syntax; a start tag is a syntactically complicated object which must somehow be parsed.)

Once you've done all that, you'll need to cope with the essential context-sensitivity of XML, which results from the need for agreement between start and end tags. Most people simply redefine that as "semantic", in order to be able to use a context-free grammar, but it is still required for a correct parse.

  • Related