How does a program like folding@home work? Does my computer
individually perform a unit of "work" on it completely separate to other computers running folding@home? Then send the answer back when it's completed?
Or does Folding@home see all the computers connected to it as the project having let's say 1000 cores and then when work is done it's the equivalent of saying something like
make -j <total number of cores>
CodePudding user response:
Projects likes Folding@Home and BOINC are examples of loosely-coupled parallel computing where each task is fully self-contained and can be completed without communication with other computing entities. They are also examples of a pattern known as controller/worker (used to be known as master/worker), in which a central controller splits a large task into a pool of small(er) subtasks and distributes it to a bunch of worker processes on a first come first served basis, which corresponds to your first point.
In F@H (and BOINC), client computers connect to the server, request a task, work on it until it's complete, then connect to the server again to return the result and request a new task. The benefits of this are automatic load balancing, fault tolerance (via redundancy), and no need for scheduling.
When you run make -j #cores
, make
launches a number of parallel jobs but those jobs are usually interdependent, so make
has to schedule them in an optimal way. The jobs are then run as processes on the same computer which affords make
full process control. If a build step fails, the entire build job aborts immediately and the user can quickly look into the problem, fix it, and restart the build. This is not a viable model for when a client computer could have an arbitrary compute speed, could connect and disconnect at any time, and/or could decide to simply stop processing tasks. There are distributed versions of make
like dmake
that run different parts of the build process on different remote nodes, but that still happens in a tightly controlled environment, typically on a build cluster.
Note that on a very high level of abstraction the two are basically equivalent with the main difference being whether jobs are pushed or pulled. While job pulling works fine on all kinds of systems, job pushing usually requires (tightly-coupled) systems with predictable characteristics and good scheduling algorithms to be efficient.