I came across supposedly thread-safe code that both I and gcc thread sanitizer don't consider really thread-safe.
The code is something along these lines:
class thread_safe
{
public:
thread_safe(uint64_t size=0)
: size_{size}
{}
uint64_t get_size()
{
if(size_ == 0)
{
size_ = query_size();
if(size_ == 0)
throw std::runtime_error("couldn't get size");
}
return size_;
}
private:
// not certain this even matters but maybe vtable could make it worse so placing it here
virtual uint64_t query_size() = 0;
uint64_t size_;
};
My guess is that the author didn't want to pay for std::atomic
access since query_size() is guaranteed to always return the same value.
So if I consider the "integer assignment is an atomic operation" (not certain whether this is really true or if uint64_t still falls under any such reasoning since it's 64 and not 32 bit but ok...) and that query_size() always returns the same value so per-core-cache should not be an issue... then I can squint my eyes and try to convince myself that this really is thread-safe.
So could this be considered thread safe under C in general?
(or at least under some circumstances) (maybe only under C? but I doubt that there is a difference in the standard)
Is there a solution to make assignment inside get_size() function thread safe without changing size_ type or initialization location?
EDIT:
To be clear, I'm not saying that this code is well written, just want to understand if I can present a more solid case against it.
And regarding data race the code has two states:
if (in_current_thread_I_see_size_not_equal_0)
return size_
(size_ was set in some other thread and eventually came to this core)
and
if (in_current_thread_I_see_size_equal_to_0)
size_ = set_size_in_this_thread_to_same_value_that_will_be_written_in_other_threads
return size_
(size was not set in any other threads or the value from those writes hasn't reached this threads memory/register)
So in the end for each thread the "size_ already set to != 0" is just optimization (no need to read size from somewhere that could be expensive) and even though it's a race it's not necessarily an important one.
I'm interested into finding out if either standard or at least hardware supports the reasoning of the initial author of the code.
CodePudding user response:
This is not thread safe:
if(size_ == 0)
{
size_ = query_size();
// ...
}
return size_;
One thread could be reading size_
while another thread is writing to it. The writing thread may have only completed a partial write before the reader reads the value (a value that is nonzero because of the partial write), which it then returns. The caller in the reader thread will now receive a different value than the caller in the writer thread.
The C standard does not guarantee that reading/writing a uint64_t
is atomic.
CodePudding user response:
This code and the comments took me down a rabbit hole.
It turns out that using non atomic variables and them working is a mixture of architecture, use case and luck with compiler/optimizer.
It turns out that as it was noted by some comments std::atomic<uint64_t> operations can be (almost) free on architectures where uint64_t is atomic from the architecture point of view meaning x86_64 (from Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Part 1):
8.1.1 Guaranteed Atomic Operations
...
The Pentium processor (and newer processors since) guarantees that the following additional memory operations will always be carried out atomically:
• Reading or writing a quadword aligned on a 64-bit boundary.
...
Looking at the output assembly reads on x86_64 generate the same assembly for both std::atomic<uint64_t> and uint64_t (but code ordering changes so some optimizations seem to be disabled for std::atomic - first red flag that this only works for cases like the one above where you don't have inter variable dependencies).
Assignment emits slightly different assembly and as stated by a comment memory_order_relaxed
during assignment generates fairly similar code.
On the other hand on ARM64 LDR/STR are replaced by LDREX/STREX so I'd say that even if this code would work under -O0 (and threads not calling the same function time wise too close together) it would probably fail under any other optimization level even though size_
is always assigned the same value.
I haven't checked any 32bit architecture but based on a comment it sounds that it's similar to ARM64 differences that I saw - pushing luck.
So the bottom line is that:
- the code really does work reliably and correctly (albeit due to favorable circumstances and luck)
- is not supported by the C standard/memory model and thus is UB (as stated by the comments) even in case of such coincidental hardware support
- future instruction set improvements and/or new compiler optimizations could easily break this behavior since it doesn't satisfy C abstract machine
Makes me feel a bit better to know that it's not tsan being too strict but actual luck that code works correctly for now :)