Home > other >  How to set --num-executors in dataproc cluster
How to set --num-executors in dataproc cluster

Time:12-09

We have a dataproc cluster with 10 Nodes and unable to understand how to set the parameter for --num-executor for spark jobs.

Below are the points which are confusing -

  • Can we have less executor than number of worker nodes. If yes what will happen to idle worker nodes.
  • Can we have more executor than number of worker nodes. where additional executor will run ?
  • Can we run more than 1 spark application running using 1 executor per node ?
  • Can we run 1 spark application using more than 1 executor per node ?

CodePudding user response:

An executor is just a Java process running running on their own JVM on some machine somewhere.

To answer your questions:

  • Can we have less executor than number of worker nodes. If yes what will happen to idle worker nodes.
    • Yes you can. That just means that the idle worker nodes won't have this Java process running on there.
  • Can we have more executor than number of worker nodes. where additional executor will run?
    • Yes you can, provided a single worker node can host your 2 JVMs (enough memory/cpu/...) and your resource manager can handle it. The resource manager will decide which node will be hosting the additional executor process.
  • Can we run more than 1 spark application running using 1 executor per node?
    • This means that you'll have more than 1 executor on a single worker node, so same as point 2.
  • Can we run 1 spark application using more than 1 executor per node ?
    • This means that you'll have more than 1 executor on a single worker node, so same as point 2.

Hope this helps!

  • Related