Are concurrent unordered writes with fencing to shared memory undefined behavior?-CodePudding

I have heard that it is undefined behavior to read/write to the same location in memory concurrently, but I am unsure if the same is true when there are no clear race-conditions involved. I suspect that the c18 standard will state it is undefined behavior on principal due to the potential to create race conditions, but I am more interested in if this still counts as undefined behavior at an application level when these instances are surrounded by fencing.

Setup

For context, say we have two threads A and B, set up to operate on the same location in memory. It can be assumed that the shared memory mentioned here is not used or accessible anywhere else.

// Prior to the creation of these threads, the current thread has exclusive ownership of the shared memory
pthread_t a, b;

// Create two threads which operate on the same memory concurrently
pthread_create(&a, NULL, operate_on_shared_memory, NULL);
pthread_create(&b, NULL, operate_on_shared_memory, NULL);

// Join both threads giving the current thread exclusive ownership to shared memory
pthread_join(a, NULL);
pthread_join(b, NULL);

// Read from memory now that the current thread has exclusive ownership
printf("Shared Memory: %d\n", shared_memory);

Write/Write

Each thread then theoretically runs operate_on_shared_memory which mutates the value of shared_memory at the same time across both threads. However with the caveat that both threads attempt to set the shared memory to the same unchanging constant. Even if it is a race condition, the race winner should not matter. Does this count as undefined behavior? If so, why?

int shared_memory = 0;

void *operate_on_shared_memory(void *_unused) {
    const int SOME_CONSTANT = 42;

    shared_memory = SOME_CONSTANT;
    return NULL;
}

Optional Branching Write/Write

If the previous version does not count as undefined behavior, then what about this example which first reads from shared_memory then writes the constant to a second location in shared memory. The important part here being that even if one or both threads succeeds in running the if statement, it should still have the same outcome.

int shared_memory = 0;
int other_shared_memory = 0;

void *operate_on_shared_memory(void *_unused) {
    const int SOME_CONSTANT = 42;

    if (shared_memory != SOME_CONSTANT) {
        other_shared_memory = SOME_CONSTANT;
    }

    shared_memory = SOME_CONSTANT;
    return NULL;
}

If this is undefined behavior, then why? If the only reason is that it introduces a race condition, is there any reason why I shouldn't deem it acceptable for one thread to potentially execute an extra machine instruction? Is it because the CPU or compiler may re-order memory operations? What if I were to put atomic_thread_fence at the start and end of the operate_on_shared_memory?

Context

GCC and Clang doesn't seem to have any complaints. I used c18 for this test, but I don't mind referring to a later standard if they are easier to reference.

$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
$ gcc -std=c18 main.c -pthread -g -O3 -Wall

CodePudding user response：

If the value of an object is modified any time near where it is read via non-qualified lvalue, machine code generated by gcc may yield behavior which is inconsistent with any particular value the object held or could have held. Situations where such inconsistencies would occur are likely rare, but I don't think there's any way of judging whether such issues could arise in machine code generated from any particular source, except by inspecting the machine code in question.

For example, (godbolt link https://godbolt.org/z/T3jd6voax) the function:

unsigned test(unsigned short *p)
{
    unsigned short temp = *p;
    unsigned short result = temp - (temp>>15);
    return result;
}

will be processed by gcc 10.2.1, when targeting the Cortex-M0 platform, by code equivalent to:

unsigned test(unsigned short *p)
{
    unsigned short temp1 = *p;
    signed short temp2 = *(signed short*)p;
    unsigned short result = temp1   (temp2 >> 15);
    return result;
}

Although there is no way the original function could return 65535, the revised function may do so if the value at *p changes from 0 to 65535, or from 65535 to 0, between the two reads.

Many compilers are designed in a way that would inherently guarantee that an unqualified read of any word-size-or-smaller object will always yield a value the object has held or will hold in future, but unfortunately it is rare for compilers to explicitly document such things. The only compilers that wouldn't uphold such a guarantee would be those that process code using some sequence of steps which differs from that specified, but is expected to behave identically, and compiler writers seldom see any reason to enumerate--much less document--all of the transformations that they could perform, but don't.

CodePudding user response：

As long as you don't plan to count every toggle of shared_memory and other_shared_memory, and if you don't care if some modifications aren't done or are done twice unnecessarily, it should work.

For example, if your code is planned to simply monitor/show another system's activity for end users, it's fine: a mismatch during one microsecond isn't a serious issue.

If you plan to sample precisely two inputs and get an accurate array of results, or do precise computations on thread's results in shared memory, then you're doing it very wrongly.

Here, your UB is mostly that you can't guarantee that shared_memory isn't modified between the test and the assignment.

I've numbered two lines in your code:

void *operate_on_shared_memory(void *_unused) {
    const int SOME_CONSTANT = 42;

/*1*/if (shared_memory != SOME_CONSTANT) {
        other_shared_memory = SOME_CONSTANT;
    }

/*2*/shared_memory = SOME_CONSTANT;
    return NULL;
}

When on line marked 1, if for example you're toggling shared_memory between two values (SOME_CONSTANT and SOME_CONSTANT_2), since it isn't atomic reads/writes you MAY read something different than the two used constants. On line marked 2, it's the same: you can't be sure that you won't be interrupted by another write, and got finally a value that isn't SOME_CONSTANT or SOME_CONSTANT_2, but something else. Think about reading the upper part of one, and the lower part of the other.

Also, you can "miss" a true condition on line #1, and therefore miss an update to other_shared_memory, or do it twice because the write at line #2 will be messed up - so for next test line #1, value will be different from SOME_CONSTANT and you'll do an unwanted update.

All this depends on several factors, like:

Are your writes/reads atomic anyway, despite not being explicitely atomics?
Can your threads be really interrupted between lines #1 and #2, or are you "protected" (humm...) by scheduler/priorities?
Is shared memory tolerant to multiple concurrent access, or will you lock the chip that controls it if you do such an attempt?

You can't reply? That's why it's an undefined behavior...

In your particular situation, it MAY works. Or not. Or fails on my machine while working on yours.

"Undefined behavior" is usually not properly understood. What it really means is: "You cannot predict nor guarantee what the behavior will be for ALL possible platforms".

Nothing more, nothing less. It's not a guarantee of having problems, it's the absence of a guarantee of NOT having them. I may sounds like a subtle difference, but in fact it's a huge one.

By "platform", we mean the tuple build with:

An execution machine, including all currently running softwares,
An operating system, including its version and installed components,
A compiler chain, including its version and switches,
A build system, including all possible flags passed to compiler chain.

But UB doesn't mean "your program will act randomly"... A given set of CPU instruction will always produce the same result (in the same initial conditions), there is no randomness here. Obviously, they can be the wrong instructions for the problem you wish to solve, but it's reproductible. That's how we hunt bugs, BTW...

So, on a fixed platform, having an UB means "you can't predict what will happen". And in no way "you'll face pure randomness". In fact, a LOT of programs can even exploit UB, because they're known on this particular platform and it's easier/cheaper/faster than doing it the good way. Or because, even if officially an UB, your compiler does finally the same thing as the other (i.e. there is an UB when downcasting an integer to a signed smaller integer, and char is usually signed... Near nobody cares.).

But once your code is written, you'll know what the behavior is: it won't be undefined anymore... For YOUR platform, and ONLY this platform. Update your OS or your compiler, launch another program that can mess the scheduling, use a faster CPU, and you MUST test again to check if behavior is still the same. And that's why it's annoying to have UB: it can works NOW. And cause a tricky bug a bit later.

It's one of the major reasons why industrial software often use "old" OSes and/or compiler: upgrading them is a HIGH risk of triggering/causing a bug because an update corrected what was a real bug, but the project's code exploited this bug (maybe unknowingly!) and the updated software now crash... Or worse, can destroy some hardware!

We're in 2022, I still have a project that uses an embedded 2008 Linux, with a GCC3, VS2008, C 98, and WinXP/Qt4 on user's machine. Project is actively maintained - and trust me, it's a pain. But upgrading software/platform? No way. Better deal with known bugs rather than discovering new ones. Cheaper, too.

One of my speciality is softwares porting, mostly from "old" platforms to new ones (often with 10 years or more between the two). I've faced this kind of things a LOT of times: it worked on old platform, it breaks on new one, and only because an UB was exploited then, and now the behavior (still undefined...) is not the same anymore.

I obviously don't speak about changing C/C standard, or machine's endianness, where you need anyway to rewrite code, or dealing with new OS features (like UAC on Windows). I speak about "normal" code, compiled without any warning, that behaves differently now and then. And you can't imagine how frequent it is, since no compiler will warn you about neither high-level UB (for example, non thread-safe functions) nor instruction-level UB (a simple cast or alias can fully hide it without ANY warning).