Home > other >  How to control python dask's number of threads per worker in linux?
How to control python dask's number of threads per worker in linux?

Time:09-06

I tried to use dask localcluster, in multiprocess but single thread per process setup, in linux, but failed so far:

from dask.distributed import LocalCluster, Client, progress

def do_work():
    while True:
        pass
    return

if __name__ == '__main__':
    cluster = LocalCluster(n_workers=2, processes=True, threads_per_worker=1)
    client = Client(cluster)
    futures = [client.submit(do_work) for i in range(100)]
    progress(futures)

What happens is that dask indeed launches two processes (via spawn_main), but each process has 7 threads. Besides that, the main process itself (which I launched with python -m) has 20 threads. I tried the nthreads=1 kwarg as well and it didn't work either.

OTOH, the dashboard at :8787 shows that I only have one thread per process. What did I miss here?

CodePudding user response:

When you set number of threads, you normally mean the number of worker threads: that can take up a whole CPU core with useful compute. However there are other threads which take up very little CPU and help keep the process running:

  • the server listening to incoming messages (running an asyncio event loop)
  • a dashboard per worker
  • performance monitoring/profiling

You can get a list of these per worker by doing

client.run(lambda: [str(_) for _ in threading.enumerate()])

As to the main process, it creates threads that specifically listen to the subprocesses, as well as its own profiling, IO loop/comms, etc.

  • Related