Home > Mobile >  Can I replace multiple volatile reads with one given all of them are executed within ReentrantLock.l
Can I replace multiple volatile reads with one given all of them are executed within ReentrantLock.l

Time:09-23

I'm looking into the implementation of ConcurrentReferenceHashMap in Spring Framework, particularly into restructure() method:

protected final class Segment extends ReentrantLock {

    private volatile Reference<K, V>[] references; // <-- !

    private void restructure(boolean allowResize, @Nullable Reference<K, V> ref) {
        boolean needsResize;
        lock();
        try {
            //...
            boolean resizing = false;
            int restructureSize = this.references.length; // <-- !
            //...

            Reference<K, V>[] restructured =
                    (resizing ? createReferenceArray(restructureSize) : this.references);// <-- !

            for (int i = 0; i < this.references.length; i  ) { // <-- !
                ref = this.references[i]; // <-- !
                //...
            }

            if (resizing) {
                this.references = restructured;
                this.resizeThreshold = (int) (this.references.length * getLoadFactor());// <-- !
            }
            //...
        }
        finally {
            unlock();
        }
    }

As you can see here we have multiple reads and writes into volatile references array and all of them happen within lock()/unlock() synchronization block.

The JavaDoc of java.util.concurrent.locks.Lock, namely its Memory Synchronization part claims, that

All Lock implementations must enforce the same memory synchronization semantics as provided by the built-in monitor lock, as described in Chapter 17 of The Java™ Language Specification:

  • A successful lock operation has the same memory synchronization effects as a successful Lock action.
  • A successful unlock operation has the same memory synchronization effects as a successful Unlock action.

Unsuccessful locking and unlocking operations, and reentrant locking/unlocking operations, do not require any memory synchronization effects.

My question is: can I rewrite the code in order to have one read from volatile field into a local var (i.e. synchronize on stack) and use it to avoid repeating volatile access? Won't it break JMM assuming that

Unsuccessful locking and unlocking operations, and reentrant locking/unlocking operations, do not require any memory synchronization effects.

CodePudding user response:

My question is: can I rewrite the code in order to have one read from volatile field into a local var (i.e. synchronize on stack) and use it to avoid repeating volatile access? Won't it break JMM ...

You can do that, but you probably shouldn't.

You can do that because all writes to references (everywhere, not just inside restructure() method) occurs inside lock()/unlock() blocks.
As you noted, these lock()/unlock() blocks enforce the same memory synchronization semantics as provided by the built-in monitor lock.
And the built-in monitors (i.e. synchronized) provide visibility and atomicity guarantees which are stronger than volatile guarantees.

You probably shouldn't do that, because you are asking if it will break the JMM.
It seems like you aren't sure about how the JMM works.
Meanwhile references inside ConcurrentReferenceHashMap is actually used in code with data races: for example here is data race between writes into references[i] in restructure() and reads of references[i] in getReference():

    public Reference<K, V> getReference(@Nullable Object key, int hash, Restructure restructure) {
        ...
        Reference<K, V>[] references = this.references;
        ...
        Reference<K, V> head = references[index];
        ...
    }

    private void restructure(boolean allowResize, @Nullable Reference<K, V> ref) {
        ...
        lock();
        try {
            ...
            Reference<K, V>[] restructured =
                    (resizing ? createReferenceArray(restructureSize) : this.references);
            ...
            if (!resizing) {
                restructured[i] = null;
            }
            ...
            restructured[index] = this.referenceManager.createReference(
                                    entry, ref.getHash(), restructured[index]);
            ...
        }
        finally {
            unlock();
        }
    }

It's possible to write code which has data races and works correctly, but you should know perfectly and in all details how exactly both the JMM and your concurrent algorithm works.
Otherwise there is a high chance to introduce one or several synchronization bugs, which are the worst: they are counter-intuitive (see this for some examples) and cannot be unit-tested (or even reliably reproduced).

CodePudding user response:

What would be the purpose of this optimization? Did you make a benchmark and did you run it with a profiler to determine that the code is actually a bottleneck? Please check out JMH for writing microbenchmarks.

First of all, you already have a lock. If this lock for whatever reason is contended, the overhead of context switching is a lot higher than the potential overhead of volatile variables.

Even if the lock isn't contended, it doesn't mean a volatile variable is expensive. E.g. a volatile write followed by a volatile read to a different variable on the X86 requires expensive [StoreLoad] barrier, which will prevent the CPU from executing loads till the store buffer is drained. This [StoreLoad] is needed to preserve sequential consistency; otherwise the volatile write and volatile read (different address) could be reordered.

But if you have multiple successive volatile writes which are followed by a volatile read (different variable), then on the X86 only a [StoreLoad] is needed between the last volatile write and the volatile read because stores are not reordered. So the preceding volatile writes from a CPU memory fence perspective are free. So a volatile write can be pretty cheap.

On the X86 volatile reads are quite cheap as well. On the X86 every load is an acquire load and an acquire load is sufficient to implement sequential consistency. Keep in mind that on modern CPUs caches are always coherent and if the cacheline is already in the right state on the local CPU than a volatile read is equally expensive as a regular read from a CPU memory fence perspective. So if you would have 1 volatile read followed by 9 regular reads should probably not perform any different compared to 10 volatile reads.

The primary source of optimizations that are prevented by volatile, are optimizations done by the JIT.

For more information see this great post: https://shipilev.net/blog/2014/on-the-fence-with-dependencies/

[Update] The following loop I would certainly try to have a look at:

for (int i = 0; i < this.references.length; i  ) { // <-- !
                ref = this.references[i]; // <-- !
                //...
            }

I would transform it to:

Reference<K, V>[] localReferences = this.references;
for (int i = 0; i < localReferences.length; i  ) { // <-- !
                ref = localReferences[i]; // <-- !
                //...
            }

I guess some code optimizations would be possible. E.g. perhaps the loop could be unrolled and perhaps multiple assignments to the localReferences could be done in parallel due to super scalar ability of modern processors. So I would definitely create a microbenchmark and see what is going on. JMH has support for various profilers and you can see the generated assembly as well.

  • Related