I have 2 core machine with hyperthreading enabled, so I have 4vCPUs(0,1,2,3). There are several threads running which are pinned to vCPU 1 and 3. So that hyperthreads are not used. Now I have this one thread which when pinned to 0,1,2,3 runs at 40% CPU and when it is pinned to 1,3 then it runs at 35% CPU.
I am not able to understand why it takes more CPU when hyperthreads are used. Not able to get stats to prove this. I am using Ubuntu and tried using perf stat command.
Perf stat with no hyperthread(i.e. thread pinned to 1,3)
perf stat -e task-clock,cycles,instructions,cache-references,cache-misses --tid=26269 sleep 10 Performance counter stats for thread id '26269':
3341.549477 task-clock (msec) # 0.334 CPUs utilized
10836409509 cycles # 3.243 GHz
11797254268 instructions # 1.09 insn per cycle
68052778 cache-references # 20.366 M/sec
23498429 cache-misses # 34.530 % of all cache refs
perf stat -B --tid=26269 sleep 10
Performance counter stats for thread id '26269':
3112.732648 task-clock (msec) # 0.311 CPUs utilized
17296 context-switches # 0.006 M/sec
290 cpu-migrations # 0.093 K/sec
2683 page-faults # 0.862 K/sec
10043236414 cycles # 3.227 GHz
11821047920 instructions # 1.18 insn per cycle
2596058193 branches # 834.013 M/sec
30134052 branch-misses # 1.16% of all branches
Perf stat with hyperthread(i.e. thread pinned to 0,1,2,3)
perf stat -e task-clock,cycles,instructions,cache-references,cache-misses --tid=26269 sleep 10
Performance counter stats for thread id '26269':
3878.410557 task-clock (msec) # 0.388 CPUs utilized
12921569032 cycles # 3.332 GHz
11787482531 instructions # 0.91 insn per cycle
72454684 cache-references # 18.682 M/sec
19096660 cache-misses # 26.357 % of all cache refs
perf stat -B --tid=26269 sleep 10
Performance counter stats for thread id '26269':
3777.149613 task-clock (msec) # 0.378 CPUs utilized
12162 context-switches # 0.003 M/sec
1166 cpu-migrations # 0.309 K/sec
0 page-faults # 0.000 K/sec
12764333134 cycles # 3.379 GHz
11796018618 instructions # 0.92 insn per cycle
2588826495 branches # 685.392 M/sec
32417514 branch-misses # 1.25% of all branches
CodePudding user response:
When pinned to CPU:s 0,1,2,3 the thread has more free cores to work with, and instead of waiting for the other threads that are pinned to CPU:s 1,3 to finish their work it can run immediately and will thus utilize more CPU% in total.
As you can see the cache misses are also reduced since the thread cache is more localized and interferes less with the other threads when it is able to run on additional cores. This can further increase the CPU load since the thread will spend less time idling when it is waiting for RAM access.
Addendum: The processor may also be less efficient when hyperthreading, as seen in the reduced value for "insn per cycle" (instructions per cycle) also known as IPC. This could be due to CPU internal mechanisms such as pipelining, out-of-order execution and superscalar operations. This makes the thread consume more cycles in total, thus increasing the total load.