Does Java allow a volatile read to be optimized away if the value isn't needed, also removing t-CodePudding

The following code sample shows a common way to demonstrate concurrency issues caused by a missing happens-before relationship.

private static /*volatile*/ boolean running = true;
    
public static void main(String[] args) throws InterruptedException {
    new Thread() {
        @Override
        public void run() {
            while (running) {
                // Do nothing
            }
        }
    }.start();
    Thread.sleep(1000);
    running = false;
}

If running is volatile, the program is guaranteed to terminate after approximately one second. However, if running isn't volatile, the program isn't guaranteed to terminate at all (since there is no happens-before relationship in this example) and that's exactly what happens in my tests.

According to JLS 17.4.5 one can also enforce a happens-before relationship by writing to and reading another volatile variable running2, as shown in the following code sample.

private static boolean running = true;
private static volatile boolean running2 = true;
    
public static void main(String[] args) throws InterruptedException {
    new Thread() {
        @Override
        public void run() {
            while (running2 || running) {
                // Do nothing
            }
        }
    }.start();
    Thread.sleep(1000);
    running = false;
    running2 = false;
}

The program is guaranteed to terminate after approximately one second and that's exactly what happens in my tests. However, when I put the read of the variable running2 into an empty if statement inside the while loop, as shown in the following code sample, the program doesn't terminate in my tests.

private static boolean running = true;
private static volatile boolean running2 = true;
    
public static void main(String[] args) throws InterruptedException {
    new Thread() {
        @Override
        public void run() {
            while (running) {
                if (running2) {
                    // Do nothing
                }
            }
        }
    }.start();
    Thread.sleep(1000);
    running = false;
    running2 = false;
}

The idea here is that a volatile read of running2 is like a compiler memory barrier: the compiler has to make asm that re-reads non-volatile variables because the read of running2 might have synchronized-with a release operation in another thread. That would guarantee visibility of new values in non-volatile variables like running.

But my JVM seems not to be doing that. Is this a JVM bug, or does the JLS allow it to remove a volatile read when the value isn't needed? (It's only controlling an empty if body, so the program behaviour doesn't depend on the value read, only on creating a happens-before.)

However, I am wondering if this optimization is actually allowed according to the JLS, which actually guarantees a happens-before relationship for the subsequent read of running in the while loop condition.

I thought the JLS applies to the source code and since running2 is volatile, the effect of reading the variable shouldn't be allowed to be removed due to the optimization. Is this a compiler or JVM bug, or is there a specification, which actually allows such optimizations?

CodePudding user response：

Is this a JVM bug, or does the JLS allow it to remove a volatile read when the value isn't needed?

It's neither.

This execution is valid according to the JLS.

The second thread must finish shortly after it reads running2 == true.
But the JLS provides no guarantees about the time it takes for a write in one thread to become visible in another thread.
As a result, your program execution is valid, because it corresponds to the case when the write running2 = false takes a very long time to propagate to another thread.

By the way on my java version (OpenJDK 64-Bit Server VM (build 17.0.3 7-suse-1.4-x8664, mixed mode)) the program finishes in about 1 second.
This is also a valid execution — this corresponds to the case when the write running2 = false propagate to the second thread quicker.

PS you mentioned "memory barrier".
For a memory barrier there typically exists some max time, after which it is guaranteed to propagate to other threads.
But the JLS doesn't operate in terms of memory barriers, doesn't have to use them, and actually guarantees only this:

An implementation is free to produce any code it likes, as long as all resulting executions of a program produce a result that can be predicted by the memory model.

PSS If you want to see the real assembly code that the JVM produced for your program you can use PrintAssembly.

CodePudding user response：

... does the JLS allow it to remove a volatile read when the value isn't needed? (It's only controlling an empty if body, so the program behaviour doesn't depend on the value read, only on creating a happens-before.)

According to 17.4. Memory Model of the JLS:

The memory model describes possible behaviors of a program. An implementation is free to produce any code it likes, as long as all resulting executions of a program produce a result that can be predicted by the memory model.

So the JLS permits literally anything in runtime as long as result of the execution is "legal".

By "result of the execution" the JLS means all the external actions that the program performs: operations with files and network sockets, various system calls (e.g. reading current time), etc.
I believe that 17.4.9. Observable Behavior and Nonterminating Executions of the JLS is about that (or something like that).

In your example the only external action is sleeping for 1s, so your program can be "optimized" to:

    public static void main(String[] args){
        Thread.sleep(1000);
    }

If the answer above is correct in that an infinite loop is also a legal execution, then, I guess, your program can be "optimized" to:

    public static void main(String[] args){
        while(true);
    }

One more time: the runtime is allowed to do anything as long as it performs the same external actions as one of the legal executions allowed by the JLS.

To clarify things more, let's get legal executions for our example.

General Algorithm

The general algorithm is described in 17.4. Memory Model of the JLS.

The actions of each thread in isolation must behave as governed by the semantics of that thread, with the exception that the values seen by each read are determined by the memory model.

So we assume that actions in each thread are executed sequentially, one-by-one.
The only difference from a single-threaded program is that for a variable accessed from multiple threads a read might return an "unexpected" value.

The rule to to get all possible values for a read is this:

Informally, a read r is allowed to see the result of a write w if there is no happens-before ordering to prevent that read.

In other words, a read of some variable returns:

either the last write to than variable in happens-before order
or any write to than variable, that is not related to the read by happens-before

Note that the algorithm doesn't allow to "optimize out" anything.

Legal Executions For The Example

Now let's apply the algorithm to our example to find legal executions.
(Note: for simplicity we'll omit cases like unexpected Error and termination of the program by the OS)

The main thread has no reads of shared variables, so it behaves just like a single-threaded program.
It's actions:

new Thread(){...}.start();
Thread.sleep(1000);
running = false;
running2 = false;

The second thread is a loop with 2 reads.
So we get a sequence of actions:

read(running == true)
read(running2 == ?)
read(running == true)
read(running2 == ?)
...
read(running == false)

The sequence ends as soon as a read of running returns false.

What values can the reads return according to the JLS?
Let's first note that running2 is volatile, which means that reads ands writes to it happen in a global order (it's called synchronization order) and are visible to all threads in that order.
So:

before the write running2 = false becomes visible to the second thread:

running2 == true
This is the initial write (the only visible write).

running == true or running == false
A read running == ?:

the initial write (running = true) happens-before the read
the write running = false is not related by happens-before with the read

So each read running == ? can return any of the two writes randomly.

Main Thread                           Thread2                 

[...]                                 [...]                   
  ↓ (happens-before)                    ↓ (happens-before)    
running = false;                   ┌- running2 == true;       
  ↓ (happens-before)               |    ↓ (happens-before)    
running2 = false; <----------------┘  running == true | false 
                   (happens-before)

after the write running2 = false becomes visible to the second thread:

running2 == false
This is the latest visible write.

running == false
Because running2 == false
=> running2 = false happens-before running2 == false
=> transitively running = false happens-before running == false

Main Thread              Thread2              

[...]                                         
  ↓ (happens-before)                          
running = false;                              
  ↓ (happens-before)     [...]                
running2 = false;          ↓ (happens-before) 
  └--------------------> running2 == false;   
      (happens-before)     ↓ (happens-before) 
                         running == false;

As a result, the second thread finishes when the write running2 = false becomes visible to it.

To sum up, all legal executions of the second thread:

can start with this sequence:

read(running == true)
read(running2 == true)
[... repeat the fragment above ...]

end with:

either:
```
...
read(running2 == false)
read(running == false)
```
This is the case when the thread sees running2 = false and then is guaranteed to see running = false.
or with:
```
...
read(running == false)
```
This is the case when the thread doesn't see running2 = false, but the sees running = false.
or never ends.
This is the case when the thread sees neither running2 = false nor running = false.

If you can "optimize out" a volatile read and the result of the execution will be the same as the results of some legal executions described above, then this optimization is legal.

Regarding the AdvancedJMM_15_VolatilesAreNotFences test mentioned in the comments.

It doesn't seem to me that this test demonstrates that the compiler is allowed to remove a volatile load/store if the value isn't used.

IMO it shows that volatile is weaker than UNSAFE.storeFence() UNSAFE.loadFence().
Basically it's a demonstration of the Roach Motel optimization: y = 1 can be moved before b = 1.

AdvancedJMM_14_SynchronizedAreNotFences is different because it uses synchronized (new Object()) {} — there are no shared variables and no happens-before relations.

P.S. @pveentjer mentioned in the comments this:

The non-normative section of the JVM does talk about visibility; so a change should become visible to other threads at some point.

Does anyone have a link and a quote to support that?
I cannot find it anywhere, but, as noted by Peter Cordes, it would be really usefull to know that Java (or even only some JVMs) doesn't allow an infinite delay in visibility of a volatile write.