Why does std::barrier
allocate memory on the heap while std::latch
doesn't?
The main difference between them is that std::barrier
can be reused while std::latch
can't, but I can't find an explanation on why this would make the former allocate memory.
CodePudding user response:
While it is true that a naive barrier could be implemented with a constant amount of storage as part of the std::barrier
object, real-world barrier implementations use structures that have > O(1) storage, but better concurrency properties.
For example, the gcc libstdc implementation uses a tree barrier, which avoids having a large number of threads contending on the same counter. Such contention can lead to O(N) performance when releasing threads using the naive barrier, while a tree barrier can propagate the "barrier done, time to release" signal in logarithmic time, at the expense of needing linear space to represent the thread tree.
It's not too difficult to imagine enhanced tree-style barrier implementations that are aware of cache/socket/memory bus hierarchy, and group threads in the tree based on their physical location to minimize cross-core, cross-die, and cross-socket polling to the minimum required.
On the other hand, I'm not quite sure why a latch would be forbidden to allocate - cppreference states The latch class is a downward counter of type std::ptrdiff_t
which would indicate that it is so (i.e. it's just a counter and has no space to hold a pointer to an allocated object), but [thread.latch] says nothing more than "A latch maintains an internal counter that is initialized when the latch is created" without forbidding it from maintaining other stuff. From a design standpoint, it does seem like a much more minimal and lightweight synchronization tool.