Force Linux to schedule processes on CPU cores that share CPU cache-CodePudding

Modern AMD CPUs consist of multiple CCX. Each CCX has a separate L3 cache.

It's possible to set process affinity to limit a process to certain CPU cores.

Is there a way to force Linux to schedule two processes (parent process thread & child process) on two cores that share L3 cache, but still leave the scheduler free to choose which two cores?

CodePudding user response：

The underlying library functions for processes support setting CPU set masks, which allows you to define a set of cores on which a process is elegible to run. There's the equivalent for pthreads. See this man page and this command line tool.

In my experience there is very little to be gained by trying to out-think the kernel. It's already pretty good at understanding the "distance" between memory and code requesting allocations - see this - so it is already pre-disposed to doing what you want, out of the box. It is making a decision for the whole OS plus all running processes, not just your application, However, if you're not going to be removing all those other processes (daemons, services, etc) then the kernel is better placed than you are to determine what the best arrangement is for all processes / threads.

Plus, the hardware manufacturers try very hard to make SMP performance as "flat" as possible, even though these days it's NUMA with a cache-coherency network running between cores / clusters of cores.

If you really want to wring as much out of the hardware as possible, you can start playing around with core affinities (which you can apply to both threads and memory allocations I think), but it's worth bearing in mind that the effort / benefit ratio is probably going to be poor. I once did put serious effort into using core affinities to lay out an application's thread across the CPU in what looked like an ideal manner, and got basically nothing for it (<1% improvement). So, I've not bothered since.

CodePudding user response：

If you manually pick a CCX, you could give them each the same affinity mask that allows them to schedule on any of the cores in that CCX.

An affinity mask can have multiple bits set.

I don't know of a way to let the kernel decide which CCX, but then schedule both tasks to cores within it. If the parent checks which core it's currently running on, it could set a mask to include all cores in the CCX containing it, assuming you have a way to detect how core #s are grouped, and a function to apply that.

You'd want to be careful that you don't end up leaving some CCXs totally unused if you start multiple processes that each do this, though. Maybe every second, do whatever top or htop do to check per-core utilization, and if so rebalance? (i.e. change the affinity mask of both processes to the cores of a different CCX). Or maybe put this functionality outside the processes being scheduled, so there's one "master control program" that looks at (and possibly modifies) affinity masks for a set of tasks that it should control. (Not all tasks on the system; that would be a waste of work.)

Or if it's looking at everything, it doesn't need to do so much checking of current load average, just count what's scheduled where. (And assume that tasks it doesn't know about can pick any free cores on any CCX, like daemons or the occasional compile job. Or at least compete fairly if all cores are busy with jobs it's managing.)

Obviously this is not helpful for most parent/child processes, only ones that do a lot of communication via shared memory (or maybe pipes, since kernel pipe buffers are effectively shared memory).

It is true that Zen CPUs have varying inter-core latency within / across CCXs, as well as just cache hit effects from sharing L3. https://www.anandtech.com/show/16529/amd-epyc-milan-review/4 did some microbenchmarking on Zen 3 vs. 2-socket Xeon Platinum vs. 2-socket ARM Ampere.