Parsing table for LL1 grammar-CodePudding

I have to produce an LL1 grammar that covers the IF, IF-ELSE, IF - ELSE IF - ELSE condition for a C program. I was doing the parsing table and in the AUX_PROG rule I obtain two entries for the same terminal ( "}", ")" ), meaning that the grammar isn't LL(1).

Can you tell me what can I change in order to get it right?

This is my grammar with also the first and the follows:

<MAIN> ::= int main () { <AUX_PROG> }
<AUX_PROG> ::= <PROG> <AUX_PROG>
| ε
<PROG> ::= <IF_STAT>
| <AUX_OTHER>
<IF_STAT> ::= if (<AUX_OTHER>) {<AUX_PROG>} <ELSE_STAT>
<AUX_OTHER> ::= other <AUX_OTHER>
| ( <AUX_PROG> ) <AUX_OTHER>
| { <AUX_PROG> } <AUX_OTHER>
| <KEYWORD> <AUX_OTHER>
| ε
<ELSE_STAT> ::= else <AUX_ELSE_STAT>
| ε
<AUX_ELSE_STAT> ::= { <AUX_PROG> }
| if (<AUX_OTHER>) {<AUX_PROG>} else <AUX_ELSE_STAT>
<KEYWORD> ::= int
| main



follow(AUX_OTHER) ={)} U follow(PROG) = {if, other, (, {, int, main,"}", )}
follow(ELSE_STAT) = follow(IF_STAT) = {if, other, (, {, int, main,"}", )}
follow(IF_STAT) = follow(PROG) = {if, other, (, {, int, main,"}", )}
follow(PROG) = (first(AUX_PROG) - ε) U follow(AUX_PROG) = {if, other, (, {, int, main,"}", )}
follow(AUX_PROG) = {"}", )}
follow(MAIN) = {$}
follow(AUX_ELSE_STAT) = follow(ELSE_STAT) = {if, other, (, {, int, main,"}", )}
follow(KEYWORD) = (first(AUX_OTHER) - ε) U follow(AUX_OTHER) = {other, (, {, int, main, if,"}", )}



first(MAIN) = {int}
first(AUX_PROG) = {if, other, (, {, int, main, ε}
first(PROG) = {if, other, (, {, int, main, ε}
first(IF_STAT) = {if}
first(ELSE_STAT) = {else, ε}
first(AUX_ELSE_STAT) ={ "{", if }
first(AUX_OTHER) = {other, (, {, int, main, ε}
first(KEYWORD) = {int, main}

UPDATE:

I have also tried to change the AUX_OTHER production to remove that ε element but it also gives me double entries in the parsing table.

<AUX_OTHER> ::= other <AUX_PROG>
| ( <AUX_PROG> ) <AUX_PROG>
| { <AUX_PROG> } <AUX_PROG>
| <KEYWORD> <AUX_PROG>

CodePudding user response：

Your AUX_PROG rule is ambiguous, and it can be trivially proven with a few substitutions:

<AUX_PROG>, use <AUX_PROG> ::= <PROG> <AUX_PROG>
<PROG> <AUX_PROG>, use <PROG> ::= <AUX_OTHER>
<AUX_OTHER> <AUX_PROG>, use <AUX_OTHER> ::= ε
<AUX_PROG>

So we just proved that your grammar (can) define <AUX_PROG> as <AUX_PROG>, infinitely recursive.

As to how to fix it... I'm not sure, I can't read what you intend this grammar to be from this broken definition. You even have such broken rules as if() {<AUX_PROG>} <ELSE_STAT> being allowed, which doesn't make any sense to me (but maybe it's intentional? who knows).

CodePudding user response：

In this previous question which you asked almost two weeks ago, you proposed the following grammar:

<MAIN> ::= int main () { <PROG> <AUX_PROG> }
<AUX_PROG> ::= <PROG> <AUX_PROG> | ε
<PROG> ::= <IF_STAT> | other | ε
<IF_STAT> ::= if ( other ) { <PROG> } <ELSE_STAT>
<ELSE_STAT> ::= else { <PROG> } | ε

As I commented in my answer to that question, the fundamental issues with that grammar is that <PROG> ::= ε leads to an ambiguity. You can solve that problem simply by removing that alternative, since it's completely unnecessary. As a general rule, regardless of parsing technology, a repeated element in a grammar must not be nullable --that is, it must derive at least one token-- because if you can repeat an empty derivation, it's impossible to say how many repetitions there are [Note 1].

That would lead to the grammar:

<MAIN> ::= int main () { <PROG> <AUX_PROG> }
<AUX_PROG> ::= <PROG> <AUX_PROG> | ε
<PROG> ::= <IF_STAT> | other
<IF_STAT> ::= if ( other ) { <PROG> } <ELSE_STAT>
<ELSE_STAT> ::= else { <PROG> } | ε

That grammar is unambiguous and LL(1). But it's probably not the solution to your problem, because it doesn't recognise all the legal programs you're expected to recognise.

One obvious problem is that blocks inside braces ({ … }) now cannot be empty, which was probably desirable. Another one (which was already present in the original) is that the targets of if statements and else clauses, even though they are enclosed in braces, are required to be single statements, rather than a sequence of statements. But both of those issues are easy to fix; all we need to do is to use <AUX_PROG> instead of <PROG> inside blocks.

As a side note, it really helps when you're writing a grammar (or any computer program) to use meaningful identifiers. Your <PROG> is not a program, as one might expect from its name. It's a statement. And it's really impossible to know what <AUX_PROG> might mean. So in this version, I've changed the syntax names to something which I find easier to understand as a reader:

<MAIN> ::= int main () { <STATEMENT_LIST> }
<STATEMENT_LIST> ::= <STATEMENT> <STATEMENT_LIST> | ε
<STATEMENT> ::= <IF_STATEMENT> | other
<IF_STATEMENT> ::= if ( other ) { <STATEMENT_LIST> } <ELSE_CLAUSE>
<ELSE_CLAUSE> ::= else { <STATEMENT_LIST> } | ε

That's better, but it still is probably not the language you were trying to parse. As I also said in the previous answer, it's very difficult to provide good advice on that, because at no point do you manage to specify the language you are actually trying to parse. It's certainly not C, and it couldn't be solved for C, because the syntax of C's if statements does not have an LL(k) grammar for any value of k [Note 2].

The above grammar is LL(1) because it requires the targets of the if statement to be fully delimited ({ … }), and that can be described by an LL(1) grammar, as seen above. But even though most C style guides suggest that you always use that style, it's not obligatory.

Furthermore, there is one important exception to the style rule, which has to do with chained alternatives, which are quite common in real C programs:

if (x > 0) {
  /* Do something */
}
else if (y > 0) {
  /* Do something else */
else if (z > 0) {
  /* Do yet another different thing */
}
else {
  fprintf(stderr, "%s\n", "One of x, y and z must be positive.");
  return -1;
}

Note that in the above code, the targets of the else clauses, except for the last one, are not enclosed in braces. Rather, they are single if statements. The reason style guides make an exception for that case is the "always enclose targets in braces" rule leads to excessive indentation, which rapidly becomes unreadable, and obscures the logical structure of the code, which is a linear sequence of alternatives.

if (x > 0) {
  /* Do something */
}
else {
  if (y > 0) {
    /* Do something else */
  }
  else {
    if (z > 0) {
      /* Do yet another different thing */
    }
    else {
      fprintf(stderr, "%s\n", "One of x, y and z must be positive.");
      return -1;
    }
  }
}

I'm guessing that your assignment is to write a grammar which does allow the else if exception to the brace rule. And that can be achieved in a very simple way, by just adding it to the alternatives in the else clause.

In order to avoid multiple productions with the same token (else), I had to refactor <ELSE_CLAUSE>. Also, I took the opportunity to save some typing by refactoring { <STATEMENT_LIST> } into <BLOCK>, and adding it to the list of possible alternatives for <STATEMENT>.

<MAIN> ::= int main () <BLOCK>
<BLOCK> ::= { <STATEMENT_LIST> }
<STATEMENT_LIST> ::= <STATEMENT> <STATEMENT_LIST> | ε
<STATEMENT> ::= <IF_STATEMENT> | <BLOCK> | other
<IF_STATEMENT> ::= if ( other ) <BLOCK> <OPTIONAL_ELSE_CLAUSE>
<OPTIONAL_ELSE_CLAUSE> ::= else <ELSE_TARGET> | ε
<ELSE_TARGET> ::= <BLOCK> | <IF_STATEMENT>
---
That might or might not parse the language you are seeking to parse, but I hope that it illustrates some of the thought processes that go into creating simple and relatively readable grammars.

### Notes

1. If simple statements were required to be terminated with a <kbd>;</kbd>, as they are in C, then an `empty-statement` would actually consist of a single, token: `;`. That would be fine. But without the semicolon terminator, nothing remains and repeating nothing is ambiguous.

2. The fact that C `if` statements do not have an LL(1) grammar does not make them impossible to parse with a recursive descent parser, because a recursive descent parser is a program in a Turing-complete language, and it does not have to obey arbitrary rules about what is and is not an LL(1) grammar. In particular, when you write the RD parser, you are free to simply ignore the grammar's predict-conflict. In the place where you see `else` and you have the options of predicting `else <ELSE_TARGET>` or `ε`, you always chose to predict the `else` clause. That forces the `else` to match the preceding unmatching `if`, which is certainly correct.