Home > front end >  Does multicore processors really perform work in parallel?
Does multicore processors really perform work in parallel?

Time:06-30

I am deal with threading and related topics like: processes, context switching... I understand that on a system with one multicore processer real work of more than one processes isn't real. We have just an illusion of such work, because of process context switching.

But, what about threads within one process, that runs on a multicore processor. Does they really work simultaneously or it's also just an illusion of such work? Does processor with 2 hardware cores can work over two threads at a time? If not, what is the point in multicore processors?

CodePudding user response:

Does processor with 2 hardware cores can work over two threads at a time?

Yes,...

...But, Imagine yourself back in Victorian times, hiring a bunch of clerks to perform a complex computation. All of the data that they need are in one book, and they're supposed to write all of their results back into the same book.

The book is like a computer's memory, and the clerks are like individual CPUs. Since only one clerk can use the book at any given time, then it might seem as if there's no point in having more than one of them,...

... Unless, you give each clerk a notepad. They can go to the book, copy some numbers, and then work for a while just from their own notepad, before they return to copy a partial result from their notepad into the book. That allows other clerks to do some useful work when any one clerk is at the book.

The notepads are like a computer's Level 1 caches—relatively small areas of high-speed memory that are associated with a single CPU, and which hold copies of data that have been read from, or need to be written back to the main memory. The computer hardware automatically copies data between main memory and the cache as needed, so the program does not necessarily need to be aware that the cache even exists. (see https://en.wikipedia.org/wiki/Cache_coherence)

But, the programmer should to be aware: If you can structure your program so that different threads spend most of their time reading and writing private variables, and relatively little time accessing variables that are shared with other threads, then most of the private variable accesses will go no further than the L1 cache, and the threads will be able to truly run in parallel. If, on the other hand, threads all try to use the same variables at the same time, or if threads all try to iterate over large amounts of data (too large to all fit in the cache,) then they will have much less ability to work in parallel.

See also:

https://en.wikipedia.org/wiki/Cache_hierarchy

CodePudding user response:

Multiple cores do actually perform work in parallel (at least on all mainstream modern CPU architecture). Processes have one or multiple threads. The OS scheduler schedule active tasks, which are generally threads, to available core. When there are more active tasks than available cores, the OS use preemption so execute tasks concurrently on each core.

In practice, software applications can perform synchronization that may cause some cores to be inactive for a given period of time. Hardware operation can also cause this (eg. waiting for memory data to be retrieved, doing an atomic operation).

Moreover, on modern processors, physical cores are often split in multiple hardware threads that can each execute different tasks. This is called SMT (aka Hyper-threading). On quite recent x86 processors, 2 hardware threads of a same core can simultaneously execute 2 tasks in parallel. The tasks can share parts of the physical core like execution units so using 2 hardware thread can be faster than 1 for some tasks (typically the ones not using fully the processor cores).

Having 2 hardware threads that cannot truly run in parallel but run concurrently at a low granularity can still beneficial for performance. In fact, it was the case for a long time (during the last decade). For example, when a task is latency bound (eg. waiting for data to be retrieved from the RAM), another task can be scheduled so to do some work, improving the overall efficiency. This was the initial goal of SMT. The same is true for pre-empted tasks on a same core (though the granularity need to be much bigger): one process can perform a networking operation and be pre-empted so another process can do some work before being pre-empted again because of data being received from the network.

  • Related