Multi-processing utilizes multi-core CPU by distributing tasks to many processes. This makes CPU handle "this program's tasks" more frequently, since different core can load different process and executes at the same time. On the other hand, my understanding of multi-threading is that different threads from the same process can also be loaded to different CPU cores to run concurrently. So if we put inter-process/inter-thread communication aside, is multi-processing just a way to gain more cpu resources? I use both techniques without fully understanding what happens at firmware/hardware level...
Let's say, I have a program handling a bunch of totally CPU-bound tasks with same body and different parameters. These tasks can be fully parallelly processed. What would be different if I use multi-processing (N processes) vs multi-threading (N threads)?
CodePudding user response:
When it comes to pure computational power in average computers, there is usually no difference in performance, they are both ways to utilize multiple cores. (IPC and startup overhead aside)
When it comes to HPC (high performance computing), multiple processes can have different resources allocated to each process, (each NUMA node in a NUMA cluster has its own private memory), which achieves higher throughput by less resources contention (including cores contention), NUMA nodes aside, you can also pin each process to certain cores, for time-critical applications where you cannot rely on the OS scheduler to prioritize your process, and manage the priority of each process manually.
similar functionality can be done on threads in some systems, but it requires major code modifications, while the process-based approach is simpler and more portable.
Having multiple processes allows scalability and isolation
- later on you can have each process working on a different computer, which is connected over the network (cluster computing)
- you have better error handling in case one worker fails or crashes (kubernetes)
- you can have each team working on a different "process" in your application.
In short, threads are useful if you are doing a small computation concurrently or need the shared address space for low latency, but if you are doing an application-scale computation, then it's best to isolate it in its own Process/Application, this is how most simulators work, in such applications both multiprocessing and multithreading go hand-in-hand, so you have multiple processes each with its own multiple threads.