LL1 grammar for IF-ELSE condition for a C program-CodePudding

I have to produce an LL1 grammar that covers the IF, IF-ELSE, IF - ELSE IF - ELSE condition for a C program. I was doing the follow and I wasn't able to solve the recursions so I thought that maybe my grammar is wrong or not satisfiyng the LL1 conditions.

Can you tell me if the grammar is correct?

<MAIN> ::= int main () { <PROG> <AUX_PROG> }
<AUX_PROG> ::= <PROG> <AUX_PROG> | ε
<PROG> ::= <IF_STAT> | other | ε
<IF_STAT> ::= if ( other ) { <PROG> } <ELSE_STAT>
<ELSE_STAT> ::= else { <PROG> } | ε


follow(PROG) = { "}", if, other }
follow(AUX_PROG) = { "}" }
follow(IF_STAT) = follow(PROG) = { "}", if, other }
follow(ELSE_STAT) = follow(IF_STAT) = { "}", if, other }
follow(MAIN) = { $ }

first(MAIN) = { int }
first(AUX_PROG) = { if, other, ε }
first(PROG) = { if, other, ε }
first(IF_STAT) = { if }
first(ELSE_STAT) = { else, ε }

UPDATE: I have modified the grammar and also I have included the first and the follow. The braces are required so that there is no dangling-else problem.

CodePudding user response：

That grammar is ambiguous because <PROG> ::= ε makes <AUX_PROG> ::= <PROG> <AUX_PROG> left-recursive. If you eliminate the null production for <PROG> then the grammar is certainly LL(1).

But just being LL(1) does not demonstrate that the grammar correctly recognises the desired syntax, much less that it correctly parses each input into the desired parse tree. So it definitely depends on how you define "correct". Since your question doesn't really specify either the syntax you hope to match nor the form in which you would like it to be analysed, it's hard to comment on these forms of correctness.

You're absolutely correct to note that the heart of C's dangling-else issue is that C does not require the bodies of if and else clauses to be delimited. So the following is legal C:

if (condition1) if (condition2) statement1; else statement2;

and the language's rules cause else statement2 to be bound to if (condition2), rather than the first if.

That's often called an ambiguity, but it's actually easy to disambiguate. You'll find the disambiguation technique all over the place, including Wikipedia's somewhat ravaged entry on dangling else, or most popular programming language textbooks. However, the disambiguation technique does not result in an LL(1) grammar; you need to use a bottom-up parser. (Even an operator precedence parser can deal with it, but LALR(1) parsers are probably more common.)

As Wikipedia points out, a simple solution is to change the grammar to remove the possibility of if (c1) if (c2) .... A simple way to do that is to insist that the target of the if be delimited in some way, such as adding braces (which would in any case be required if the body were more than one statement). It's not necessary to put the same requirement on the body of the else clause, but that would probably be confusing for language users. However, it is convenient to permit chained if...else constructs like this:

if (c1) {
    body1
}
else if (c2) {
    body2
}
else if (c3) {
    body3
}
...

That's not ambiguous, even though the body of each else is not delimited. In some languages, that construct is abbreviated by using a special elseif token (which might be spelled elif or elsif) in order to preserve the rule that else clauses must be delimited blocks. But it's not too eccentric to simply allow else if as an exception to the general rule about bodies.

So if you're designing a language, you have options. If you're implementing someone else's language (such as the one given by the instructor of a course) you need to make sure you understand what their requirements are.