Here is a simple but very common grammar rule case in EBNF format, the Statements
is a none terminal symbol and Statement
is none terminal symbol:
Statements ::= (Statement ';')*
After converting this rule to NFA, and then do the subset contruction for converting the NFA to DFA, and at last get the dfa:
State0 -> Statement -> State1 -> ';' ->State0
State0 -> ε -> State0
The State0
is the DFA's start state representing the none terminal symbol Statements
, also it is the finish state.
From State0
input Statement
and traslate to State1
and input ';' at State1
, translate to State0
.
Also, State0
could translate to self with the ε
.
And after converting the above dfa to regular grammar following the algorithm in dragon book, i get the following grammar rules:
Statements -> ε
Statements -> Statement Extend_NT
Extend_NT -> ';' Statements
It added the new none terminal symbol Extend_NT
, but i want to get the following the regular grammars which does not contain the extend symbol Extend_NT
:
Statements -> ε
Statements -> Statement ';' Statements
So the question is that is there any algorithm could get the above result that does not contain the new none terminal symbol Extend_NT
?
Or it is just a engineering problem?
CodePudding user response:
I'm not really sure I understand your question.
If you just have a single production for a non-terminal and that production just has a single repetition operator, either at the beginning or the end, you can apply a simple desugaring: (Here α
and β
are sequences of grammar symbols (but not EBNF symbols), and α
might be empty.)
N ::= α (β)* ⇒ N → α | N β
N ::= (β)* α ⇒ N → α | β N
If α
is empty, you could use either of the above. For an LALR(1) parser generator, the left-recursive version would be the usual choice; if you're building an LL parser, you will of course want the right-recursive version.
If there is more than one EBNF operator in the right-hand side, then you will need extra non-terminals, one for each EBNF repetition operator, except possibly one.
I don't know whether you'd call that an algorithm. I personally think of it as engineering, but the difference is not absolute.