I tried to use dask localcluster, in multiprocess but single thread per process setup, in linux, but failed so far:
from dask.distributed import LocalCluster, Client, progress
def do_work():
while True:
pass
return
if __name__ == '__main__':
cluster = LocalCluster(n_workers=2, processes=True, threads_per_worker=1)
client = Client(cluster)
futures = [client.submit(do_work) for i in range(100)]
progress(futures)
What happens is that dask indeed launches two processes (via spawn_main), but each process has 7 threads. Besides that, the main process itself (which I launched with python -m) has 20 threads. I tried the nthreads=1
kwarg as well and it didn't work either.
OTOH, the dashboard at :8787 shows that I only have one thread per process. What did I miss here?
CodePudding user response:
When you set number of threads, you normally mean the number of worker threads: that can take up a whole CPU core with useful compute. However there are other threads which take up very little CPU and help keep the process running:
- the server listening to incoming messages (running an asyncio event loop)
- a dashboard per worker
- performance monitoring/profiling
You can get a list of these per worker by doing
client.run(lambda: [str(_) for _ in threading.enumerate()])
As to the main process, it creates threads that specifically listen to the subprocesses, as well as its own profiling, IO loop/comms, etc.