Let's take this processor as an example: a CPU with 2 cores and 4 threads (2 threads per core).
From what I've read, such a CPU has 2 physical cores but can process 4 threads simultaneously through hyper threading. But, in reality, one physical core can only truly run one thread at a time, but using hyper threading, the CPU exploits the idle stages in the pipeline to process another thread.
Now, here is Kubernetes with Prometheus and Grafana and their CPU resource units measurement - millicore/millicpu
. So, they virtually slice a core to 1000 millicores.
Taking into account the hyper threading, I can't understand how they calculate those millicores under the hood.
How can a process, for example, use 100millicore (10th part of the core)? How is this technically possible?
PS: accidentally, found a really descriptive explanation here: Multi threading with Millicores in Kubernetes
CodePudding user response:
This gets very complicated. So k8s doesn't actually manage this it just provides a layer on top of the underlying container runtime (docker, containerd etc). When you configure a container to use 100 millicore
k8's hands that down to the underlying container runtime and the runtime deals with it. Now once you start going to this level you have to start looking at the Linux kernel and how it does cpu scheduling / rate with cgroups. Which becomes incredibly interesting and complicated. In a nutshell though: The linux CFS Bandwidth Control
is the thing that manages how much cpu a process (container) can use. By setting the quota
and period
params to the schedular you can control how much CPU is used by controlling how long a process can run before being paused and how often it runs. as you correctly identify you cant only use a 10th of a core. But you can use a 10th of the time and by doing that you can only use a 10th of the core over time.
For example
if I set quota
to 250ms and period
to 250ms. That tells the kernel that this cgroup
can use 250ms of CPU cycle time every 250ms. Which means it can use 100% of the CPU.
if I set quota
to 500ms and keep the period
to 250ms. That tells the kernel that this cgroup
can use 500ms of CPU cycle time every 250ms. Which means it can use 200% of the CPU. (2 cores)
if I set quota
to 125ms and keep the period
to 250ms. That tells the kernel that this cgroup
can use 125ms of CPU cycle time every 250ms. Which means it can use 50% of the CPU.
This is a very brief explanation. Here is some further reading:
https://blog.krybot.com/a?ID=00750-cfae57ed-c7dd-45a2-9dfa-09d42b7bd2d7 https://www.kernel.org/doc/html/latest/scheduler/sched-bwc.html