Home > database >  Does Spark Application Master always run in the master node of EMR cluster or not
Does Spark Application Master always run in the master node of EMR cluster or not

Time:03-02

I have a EMR cluster (1 master node 1 core node) and I submitted my spark-application deploy mode is cluster mode.

From the documentation, I know the driver runs inside Spark Application Master given this deploy mode, but which node (master or core) will be selected by Yarn to run Spark Application Master? Is it always master node? Thanks.

CodePudding user response:

The Application Master never runs on the master instance of the cluster (other than an edge case where you are running a single node "cluster" with no core instances at all).

The Application Master runs on a random core/task instance of the cluster. It runs in a YARN container, so it must run on an instance that is running the YARN NodeManager. The master instance runs the YARN ResourceManager, and the core/task instances run YARN NodeManager.

Also, the driver does not always run inside the Application Master process. In fact, by default (meaning "client" deploy-mode) it does not run inside the Application Master process. In this case, the driver (running on the master instance) and the Application Master (running on a random core/task instance) are two completely separate things.

If you run Spark in "cluster" deploy-mode (e.g., by adding --deploy-mode cluster to the spark-submit args), then and only then will the driver run inside of the Application Master, and it will be on a random core/task instance. The only thing running on the master instance in this case will be a thin wrapper process that polls the status of the application running in YARN.

  • Related