We have a dataproc cluster with 10 Nodes and unable to understand how to set the parameter for --num-executor for spark jobs.
Below are the points which are confusing -
- Can we have less executor than number of worker nodes. If yes what will happen to idle worker nodes.
- Can we have more executor than number of worker nodes. where additional executor will run ?
- Can we run more than 1 spark application running using 1 executor per node ?
- Can we run 1 spark application using more than 1 executor per node ?
CodePudding user response:
An executor is just a Java process running running on their own JVM on some machine somewhere.
To answer your questions:
- Can we have less executor than number of worker nodes. If yes what will happen to idle worker nodes.
- Yes you can. That just means that the idle worker nodes won't have this Java process running on there.
- Can we have more executor than number of worker nodes. where additional executor will run?
- Yes you can, provided a single worker node can host your 2 JVMs (enough memory/cpu/...) and your resource manager can handle it. The resource manager will decide which node will be hosting the additional executor process.
- Can we run more than 1 spark application running using 1 executor per node?
- This means that you'll have more than 1 executor on a single worker node, so same as point 2.
- Can we run 1 spark application using more than 1 executor per node ?
- This means that you'll have more than 1 executor on a single worker node, so same as point 2.
Hope this helps!