I have a EMR cluster (1 master node 1 core node) and I submitted my spark-application deploy mode is cluster
mode.
From the documentation, I know the driver
runs inside Spark Application Master
given this deploy mode, but which node (master or core) will be selected by Yarn
to run Spark Application Master
?
Is it always master
node?
Thanks.
CodePudding user response:
The Application Master never runs on the master instance of the cluster (other than an edge case where you are running a single node "cluster" with no core instances at all).
The Application Master runs on a random core/task instance of the cluster. It runs in a YARN container, so it must run on an instance that is running the YARN NodeManager. The master instance runs the YARN ResourceManager, and the core/task instances run YARN NodeManager.
Also, the driver does not always run inside the Application Master process. In fact, by default (meaning "client" deploy-mode) it does not run inside the Application Master process. In this case, the driver (running on the master instance) and the Application Master (running on a random core/task instance) are two completely separate things.
If you run Spark in "cluster" deploy-mode (e.g., by adding --deploy-mode cluster
to the spark-submit
args), then and only then will the driver run inside of the Application Master, and it will be on a random core/task instance. The only thing running on the master instance in this case will be a thin wrapper process that polls the status of the application running in YARN.