Outside of using volatile, how can I assure that I'm querying the latest value from memory?-CodePudding

I understand that the compiler may choose to hold a value in cache, and that I can ensure that it reads the latest value from memory every time by using volatile, but are there other ways I can ensure that the latest value is being read without adding a type qualifier?

CodePudding user response：

You really need to change your concept of what volatile means. The easiest way to think of volatile is as "don't optimize this".

Every action taken on a volatile must be observable. Which means it will always have to be read from "memory" and written back to "memory". But this was defined way before there was such a thing a caches and volatile has no affect on what the hardware does. It only governs what the compiler does. It only forces the compiler to leave every read of the variable and every write of the variable in the output and in the right order regarding other actions. It doesn't change how the variable is accessed, only that it is. In fact on modern hardware you have to combine access to a volatile either with specialized page table entries that negate caches for the most part or add memory barriers and cache flushes to make it work right.

What you actually need for multithreading is std::atomic. This includes all the necessary logic to deal with different memory models on different architectures.

CodePudding user response：

Outside of using volatile, how can I assure that I'm querying the latest value from memory?

You can't be assured that you're actually accessing memory, at least not in a portable way. Even if you use std::atomic or atomic variables (e.g. atomic_int in C) there is no guarantee that the value will come from to memory and not cache.

There are 4 cases:

it's not atomic, so there's no guarantee at all
it is atomic and the target platform isn't cache coherent (e.g. some ARM CPUs) and the compiler probably had to ensure the data came from memory as it's the only way to ensure atomicity.
it is atomic and the target platform is cache coherent (e.g. 80x86 CPUs) and therefore you probably have no reason to care if the data came from cache or memory in the first place
you actually do care if the data came from cache or memory (e.g. you're writing a tool to benchmark RAM bandwidth, or test for faulty RAM). In this case you're going to have to resort to non-portable tricks - exploiting target specific cache eviction policy, using inline assembly, asking the OS to make the memory "uncached", etc.