Home > Software design >  Hyperthread consumes more CPU for same work compared to CPU consumed while hyperthread is disabled
Hyperthread consumes more CPU for same work compared to CPU consumed while hyperthread is disabled

Time:12-15

I have 2 core machine with hyperthreading enabled, so I have 4vCPUs(0,1,2,3). There are several threads running which are pinned to vCPU 1 and 3. So that hyperthreads are not used. Now I have this one thread which when pinned to 0,1,2,3 runs at 40% CPU and when it is pinned to 1,3 then it runs at 35% CPU.

I am not able to understand why it takes more CPU when hyperthreads are used. Not able to get stats to prove this. I am using Ubuntu and tried using perf stat command.

Perf stat with no hyperthread(i.e. thread pinned to 1,3)

perf stat -e task-clock,cycles,instructions,cache-references,cache-misses --tid=26269 sleep 10 Performance counter stats for thread id '26269':

   3341.549477      task-clock (msec)         #    0.334 CPUs utilized
   10836409509      cycles                    #    3.243 GHz
   11797254268      instructions              #    1.09  insn per cycle
      68052778      cache-references          #   20.366 M/sec
      23498429      cache-misses              #   34.530 % of all cache refs

perf stat -B --tid=26269 sleep 10

Performance counter stats for thread id '26269':

   3112.732648      task-clock (msec)         #    0.311 CPUs utilized
         17296      context-switches          #    0.006 M/sec
           290      cpu-migrations            #    0.093 K/sec
          2683      page-faults               #    0.862 K/sec
   10043236414      cycles                    #    3.227 GHz
   11821047920      instructions              #    1.18  insn per cycle
    2596058193      branches                  #  834.013 M/sec
      30134052      branch-misses             #    1.16% of all branches

Perf stat with hyperthread(i.e. thread pinned to 0,1,2,3)

perf stat -e task-clock,cycles,instructions,cache-references,cache-misses --tid=26269 sleep 10

Performance counter stats for thread id '26269':

   3878.410557      task-clock (msec)         #    0.388 CPUs utilized
   12921569032      cycles                    #    3.332 GHz
   11787482531      instructions              #    0.91  insn per cycle
      72454684      cache-references          #   18.682 M/sec
      19096660      cache-misses              #   26.357 % of all cache refs

perf stat -B --tid=26269 sleep 10

Performance counter stats for thread id '26269':

   3777.149613      task-clock (msec)         #    0.378 CPUs utilized
         12162      context-switches          #    0.003 M/sec
          1166      cpu-migrations            #    0.309 K/sec
             0      page-faults               #    0.000 K/sec
   12764333134      cycles                    #    3.379 GHz
   11796018618      instructions              #    0.92  insn per cycle
    2588826495      branches                  #  685.392 M/sec
      32417514      branch-misses             #    1.25% of all branches

CodePudding user response:

When pinned to CPU:s 0,1,2,3 the thread has more free cores to work with, and instead of waiting for the other threads that are pinned to CPU:s 1,3 to finish their work it can run immediately and will thus utilize more CPU% in total.

As you can see the cache misses are also reduced since the thread cache is more localized and interferes less with the other threads when it is able to run on additional cores. This can further increase the CPU load since the thread will spend less time idling when it is waiting for RAM access.

Addendum: The processor may also be less efficient when hyperthreading, as seen in the reduced value for "insn per cycle" (instructions per cycle) also known as IPC. This could be due to CPU internal mechanisms such as pipelining, out-of-order execution and superscalar operations. This makes the thread consume more cycles in total, thus increasing the total load.

  • Related