Home > Enterprise >  How to eliminate backtracking-leading common prefix in code like {a b} a c?
How to eliminate backtracking-leading common prefix in code like {a b} a c?

Time:10-23

I am writing a recursive-discent parser for a simplified C language. The first rules I encountered (in EBNF, heavily simplified) is

program = {declaration}, {function_definition}, main_function_definition;
declaration = "int", identifier, ["=", init_value], ";";
function_definition = "int", identifier, "(", [formal_params], ")", block;
main_function_definition = "int", "main", "(", ")", block;

How to eliminate backtracking-leading common prefix "int" of program's all tree parts? I failed on using left-factoring to do so.

CodePudding user response:

How to eliminate backtracking-leading common prefix "int" of program's all tree parts?

Only way would be to hoist the common prefix into a shared production. Incidentally, I don't think you would have the parser distinguish between main and any other function declaration - that can be done in the semantic analysis phase, so taking that out you could do something like:

program = {declaration}, function_definition, {function_definition};
decl_or_funcdef_prefix = "int", identifier;
declaration =  decl_or_funcdef_prefix, ["=", init_value], ";";
function_definition = decl_or_funcdef_prefix, "(", [formal_params], ")", block;

Of course this could get messy really quickly for a realistic grammar - c-family grammars are notoriously hard to parse top-down.

CodePudding user response:

The following slightly perverse grammar works (I think) but the resulting parse tree needs to be rotated in order to associate types and names with their corresponding definition. (This is similar to what needs to be done to correct the parse tree from a algebraic expression grammar.)

The main complication is the unnecessary (IMHO) insistence on a particular order of declarations. If you instead allowed declarations to be freely interspersed, you'd have a much easier task; you could do a simple left-factoring as in the answer by @500 - Internal Server Error. This is essentially the strategy used in the grammar in the C standard.

Alternatively, instead of relying on the grammar to reject undesired declaration orders, you could do the check in the associated semantic action, which would allow you to issue a more informative error. ("Variables must be declared before functions" instead of "Unexpected '='".)

program = "int", [decls], "main", "(", ")", body;
decls = identifier, {var-decl}, {func-decl};
func-decl = "(", ")", [formal-params], body, "int", [identifier];
var-decl = ["=", expr], ";", "int", [identifier];
  • Related