Does an OS choose a memory controller when allocating memory in a server with multiple controllers?-CodePudding

Many newer multi-core servers from Intel and AMD come with multiple DRAM memory controllers on each socket. (Compared to desktops/laptops with one dual-channel controller.)

When an OS (say Linux) needs memory to service an application's request, how is one DRAM controller chosen to service the request? I see three possibilities:

Linux chooses it using some algorithm.
The hardware is wired in such a way that a particular core will use a particular memory controller.
There is a third component that makes this decision.

I haven't found any definitive answer.

CodePudding user response：

Pretty sure contiguous physical memory interleaves across controllers within a socket/package, so a single sequential read stream will distribute across all of them.

(L3 miss is when the decision is made to send a request to a memory controller, so presumably the logic in an L3 slice knows how to direct traffic to the appropriate memory controller over the ring bus or mesh (Intel) or whatever interconnect AMD uses these days. Probably based on some function of the cache-line address, although with a non-power-of-2 number of controllers a round-robin distribution of cache lines to controllers might require a divider? That would be surprising.)

BIOS/firmware may configure that, maybe with menu options to control how.

Only in a multi-socket server where each physical socket has some memory controllers is the OS involved / aware of which socket it's allocating memory on. (NUMA local vs. remote, so the memory will be faster for cores on the local socket.)

CodePudding user response：

By default, Linux uses a "first touch" allocation policy for memory -- a newly instantiated page will be located in the same NUMA domain as the core that made the access. If free memory is not available on the desired NUMA node, it will be allocated from another NUMA node.

The BIOS configures the mapping of memory controllers to NUMA nodes (which the OS then reads from tables provided by the BIOS).

The default allocation policy can be modified or overridden using the NUMA allocation APIs or more easily using the "numactl" executable. Policies available include "membind" (force memory to be allocated on a specific NUMA node or abort), "preferred" (mostly the same as the default, but blocks automatic NUMA page migration), and "interleave" (interleaves pages across a specified set of NUMA node numbers).

Recent Linux kernels support automatic NUMA page migration. When enabled, the OS monitors accesses to user pages and if they are dominantly accessed by cores from a different NUMA node, the pages will be moved to that node. This actually works surprisingly well.