https://www.linkedin.com/pulse/how-do-i-design-high-frequency-trading-systems-its-part-silahian-2/
avoiding cache misses and CPU’s context switching
how does busy/wait and spinning pattern avoids context switches if it runs two threads in one core ? It will still have context switches between these two threads(producer thread and 1 consumer thread) right ?
what are the consequences if I don't set thread affinity to one specific core ?
I completely get why it avoids cache misses. But i am still having trouble how does it solve avoiding context switches.
CodePudding user response:
how does busy/wait and spinning pattern avoids context switches if it runs two threads in one core ?
When a thread perform a lock and the lock is taken by another thread, it make s system call so the OS can know that the thread is waiting for a lock and this should be worthless to let it continue its execution. This causes system call and a context switch (because the OS will try to execute another threads on the processing unit) which are expensive.
Using a spin lock somehow lies to the OS by not saying the thread is waiting while this is actually the case. As a result the OS will wait the end of the quantum (time slice allocated to the thread) before doing a context switch. A quantum is generally pretty big (eg. 8 ms) so the overhead of context switches in this case does not seems a problem. However, it is if you care about latency. Indeed, if another thread also use a spin lock, this cause a very inefficient execution because each thread will be executed during the full quantum and the average latency will be half the quantum which is generally far more than the overhead of a context switch. To avoid this happening, you should be sure that there are more core than thread to be actively running and control the environment. Otherwise, spin locks are actually more harmful than context switches. If you do not control the environment, then spin locks are generally not a good idea.
If 2 threads running on the same core and 2-way SMT is enabled, then each thread will likely be executed on each of the hardware thread. In this case spin locks can significantly slow down the other thread while doing nothing. The x86-64 pause instruction can be used to tell to the processor that the thread is doing a spin lock and let the other thread of the same core be executed. This instruction also benefit from reducing contention. Note that even with the pause instruction, modern processor will run at full speed (possibly in turbo frequency) causing them to consume a lot of energy (and so heat).
what are the consequences if I don't set thread affinity to one specific core ?
Threads can then move between cores. If a thread move from one core to another it generally needs to reload data to its cache (typically fill the L2 from the L3 or another L2 cache) during cache misses that may occur on the critical path of a latency-critical operation. In the most critical cases, a thread can move from one NUMA node to another. Data transfer between NUMA nodes are generally slower. Not to mention the core will then access to data own by the memory of another NUMA node which is more expensive. Besides, it can increase the overall cost of context switches.