The operation pseudocode for cmpxchg
is as follows (Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A: Instruction Set Reference, A-M, 2010):
IF accumulator = DEST
THEN
ZF ← 1;
DEST ← SRC;
ELSE
ZF ← 0;
accumulator ← DEST;
FI;
At least for the first sight, the accumulator changes its value if (and only if) ZF = 0
.
So, is it safe or ignore totally the ZF and watch just the change in accumulator value to judge whether the operation was successful or not?
In other words, can I use safely the variant #2 instead of #1?
#1:
mov eax, r8d
lock cmpxchg [rdx], ecx
jz @success
#2:
mov eax, r8d
lock cmpxchg [rdx], ecx
cmp eax, r8d
jz @success
I mean, are there some very special cases when only looking for ZF can really show whether the operation was successful or not? It might be a trivial question, but lock-free multitasking is almost impossible to debug, so I have to be 101% sure.
CodePudding user response:
Your reasoning looks correct to me.
Wasting instructions to re-generate ZF won't cause a correctness problem, and just costs code-size if the cmp can fuse with a JCC. Also costs you an extra register, though, vs. only having the old value in EAX to get replaced.
This might be why it's ok for GNU C's old-style __sync
builtins (obsoleted by __atomic
builtins that take a memory-order parameter) to only provide __sync_val_compare_and_swap
and __sync_bool_compare_and_swap
that return the value or the boolean success result, no single builtin that returns both.
(The newer __atomic_compare_exchange_n
is more like the C11/C 11 API, taking an expected
by reference to be updated, and returning a bool
. This may allow GCC to not waste instructions with a cmp
.)