JAVA/JAVAC: label & break exact semantic-CodePudding

Let consider some examples on label and break statements in Java and try to compile its by OpenJDK (v18) javac. The main goal is the getting exact semantics of labels and branches (and continues).

Version1.

public class L {
    public static void main( String[] args) {
        System.out.println( "Start\n");
Label1:
Label2:
        break Label1;
        //break Label2;
        System.out.println( "Finish\n");
    }
}

Compile and disassembly.

$ javac L.java && echo $?
0
$ javap -c L
    ...
    Code:
       0: getstatic     #7                  // Field java/lang/System.out:Ljava/io/PrintStream;
       3: ldc           #13                 // String Start\n
       5: invokevirtual #15                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
       8: getstatic     #7                  // Field java/lang/System.out:Ljava/io/PrintStream;
      11: ldc           #21                 // String Finish\n
      13: invokevirtual #15                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      16: return
}

There is no any goto instructions in bytecode. (Possibly it is a result of the optimization during a bytecode generation. The AST of this example have a JCLabeledStatement and a JCBreak tree nodes.) Change example in two ways.

Version2.

        //break Label1;
        break Label2;

It is still like Version1.

Version3.

        break Label1;
        break Label2;
...
$ javac L.java
L.java:8: error: undefined label: Label2
        break Label2;
        ^
1 error

Is Version3 an incorrect? When javac compiles Version1 and Version2 is it feature? Or the examples are both incorrect, but OpenJDK's javac doesn't catch syntax errors according to JAVA Language Standard.

POSSIBLE EXPLANATION.

Label1:
Label2:
    break Label1;

means

Label1: {
    Label2: {
        break Label1;
    }
}

and Version3 means

Label1: {
    Label2: {
        break Label1;
    }
}
break Label2;

Of course break Label2 tryies to use Label2 outside of its scope. (Numerous tutorials about break with label concentrate on for/while.)

CodePudding user response：

let's look how the statements are parsed. The syntax of a labeled statement is:

LabeledStatement:
    Identifier : Statement
LabeledStatementNoShortIf:
    Identifier : StatementNoShortIf

A labeled statement has a label and contains inner statement.

So in the first two cases,

Label1:
Label2:
break Label2;

is parsed as a single labeled statement, with the label Label1 as the label, and another labeled statement as its contained statement. The contained labelled statement has Label2 as the label, and a break statement as its contained statement.

The AST looks a bit like this:

LabeledStatement
    Identifier ('Label1')
    ':'
    LabeledStatement
        Identifier ('Label2')
        ':'
        BreakStatement

In the third case,

Label1:
Label2:
break Label1;
break Label2;

is parsed as two statements. The first statement has the same structure as above, and the second statement is a break statement.

The spec also says:

The scope of a label of a labeled statement is the immediately contained Statement.

This explains why the third statement does not compile. The scope of the Label2 label does not include the second statement (break Label2;) in the third case. It only includes statement that is immediately following the label. In the first two cases, the labels are all in scope, because you are using them in the contained statement.

The semantics of a break statement is specified here.

A break statement with label Identifier attempts to transfer control to the enclosing labeled statement (§14.7) that has the same Identifier as its label; this enclosing statement, which is called the break target, then immediately completes normally. In this case, the break target need not be a switch, while, do, or for statement.

In the first two cases, the break statement transfers control itself, and immediately causes itself to complete normally. This is basically a no-op, and so does not produce any bytecode.