Home > Net >  The data racing issue cannot be reproduced with this snippet of code
The data racing issue cannot be reproduced with this snippet of code

Time:01-02

I have the below snippet of code with which I want to use to show the case of data racing in the C multithreading programming. And I compiled and ran it with gcc test.c -O0 -lpthread -o test && ./test on an x86 Linux system (with only a single core), for near a hundred times, but I didn't see any case that the printed value is incorrect (200000). Does it mean that the x86 or the compiler could guarantee that every modification on an int variable is thread-safe? Or anything wrong with my program?

Edit: so, as @Sinic and I tested, the question would be updated to: Why does data racing rarely happen on a single-core CPU? Or there won't be any data racing issues on a single-core CPU? Because, AFAIK, the threads would be scheduled randomly even on a single-core CPU. So, the result would be a mess as well.

// test.c
#include <threads.h>
#include <stdio.h>
#define THREAD_COUNT 20
#define THREAD_LOOP 10000
int counter = 0; 
int run(void* data) {
  for (int i = 0; i < THREAD_LOOP; i  )
    counter  ;  // <- each thread would modify the global variable counter here.
  printf("Thread %d terminates.\n", *((int*) data));
  return thrd_success;
}
int main(void) {
#ifndef __STDC_NO_THREADS__
  int ids[THREAD_COUNT];
  thrd_t threads[THREAD_COUNT];  
  for (int i = 0; i < THREAD_COUNT; i  ) {
    ids[i] = i   1;
    thrd_create(&threads[i], run, ids   i);
  }
  for (int i = 0; i < THREAD_COUNT; i  )
    thrd_join(threads[i], NULL);
  printf("Counter value is: %d.\n", counter); 
#endif
  return 0; 
} 

Got a snapshot of the assembly code of the run function as shown below, and also pointed out the corresponding code for counter .

enter image description here

CodePudding user response:

Does it mean that the x86 or the compiler could guarantee that every modification on an int variable is thread-safe?

No, on x86/x86-64 platforms this is not thread-safe (like on most platforms). The assembly code proves that the operation is not done atomically.

Or anything wrong with my program?

Well, possibly. The threads are not guaranteed to be executed at the time time. If the operating system takes some time to create each thread, the loop may actually be executed serially. 10000 iterations is not a lot and mainstream modern processors can execute the loop in a very short time (eg. few micro seconds), typically less than the time required to create new threads. You can mitigate the problem by using more iteration and a barrier.

I didn't see any case that the printed value is incorrect (200000).

This is not the case on my machine which does not use simultaneous multithreading (eg. HyperThreading). This shows that the problem is dependent of the execution or the hardware.

Is it possible that this issue relates to the fact that I ran it on a single-core CPU?

Yes, indeed.

The race condition mainly comes from the fact that one core can request for a given cache line that is used by another thread. The value fetched could be not up to date. On x86/x86-64 platforms, when a core writes to a cache line, it invalidates the copy in other cores. However, it still cause an issue because the operation are not done atomically.

If you only use 1 core and 1 hardware thread, then the software threads are executed serially (in an interleaved way) during a given quantum which is likely much bigger than the time to execute the loop. In this case, this means the overall execution is sequential and you should see no problem. If the quantum is smaller than the time to execute the loop, then the problem should appear since the thread can be interrupted in the middle of the loop (and the value fetched will be modified by other threads meanwhile).

You can use numactl --physcpubind=0 on Linux to pin your process to a given core. On my machine, I can confirm that the problem do not appear if only one core is used. However, it does appear with at least 2 cores. If I also set THREAD_LOOP to a value 10000 times bigger, the results is not the same anymore. This confirms the quantum hypothesis explained above.

CodePudding user response:

Does it mean that the x86 or the compiler could guarantee that every modification on an int variable is thread-safe?

No. That something works when you tested it does not mean there's a guarantee -- ever.

Or anything wrong with my program?

Yes, your program has a data race. Fix the bug and the mystery will go away.

  • Related