Running Spark on Kubernetes, with each of 3 Spark workers given 8 cores and 8G ram, results in
Executor app-xxx-xx/0 finished with state KILLED exitStatus 143
seemingly no matter how simple the computation or what flags I pass to spark-submit
.
For instance,
kubectl run -n redacted spark-client --rm -it --restart='Never' \
--image docker.io/bitnami/spark:3.2.0-debian-10-r2 \
-- run-example \
--name my-pi-calc-example-2 \
--master spark://spark-master-svc:7077 \
--deploy-mode cluster \
--driver-memory 4g \
--executor-memory 1g \
--driver-cores 4 \
--executor-cores 4 \
--verbose \
SparkPi 3
gives me the following logs on spark-worker-0
:
21/11/15 22:07:42 INFO DriverRunner: Launch Command: "/opt/bitnami/java/bin/java" "-cp" "/opt/bitnami/spark/conf/:/opt/bitnami/spark/jars/*" "-Xmx4096M" "-Dspark.master=spark://spark-master-svc:7077" "-Dspark.driver.cores=4" "-Dspark.driver.supervise=false" "-Dspark.submit.deployMode=cluster" "-Dspark.driver.memory=4g" "-Dspark.executor.memory=4g" "-Dspark.submit.pyFiles=" "-Dspark.jars=file:///opt/bitnami/spark/examples/jars/scopt_2.12-3.7.1.jar,file:///opt/bitnami/spark/examples/jars/spark-examples_2.12-3.2.0.jar,file:/opt/bitnami/spark/examples/jars/spark-examples_2.12-3.2.0.jar" "-Dspark.rpc.askTimeout=10s" "-Dspark.app.name=my-pi-calc-example-2" "-Dspark.executor.cores=4" "org.apache.spark.deploy.worker.DriverWrapper" "spark://[email protected]:34637" "/opt/bitnami/spark/work/driver-20211115220742-0006/spark-examples_2.12-3.2.0.jar" "org.apache.spark.examples.SparkPi" "3" "--verbose"
21/11/15 22:07:44 INFO Worker: Asked to launch executor app-20211115220744-0006/4 for Spark Pi
21/11/15 22:07:44 INFO SecurityManager: Changing view acls to: spark
21/11/15 22:07:44 INFO SecurityManager: Changing modify acls to: spark
21/11/15 22:07:44 INFO SecurityManager: Changing view acls groups to:
21/11/15 22:07:44 INFO SecurityManager: Changing modify acls groups to:
21/11/15 22:07:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); groups with view permissions: Set(); users with modify permissions: Set(spark); groups with modify permissions: Set()
21/11/15 22:07:44 INFO ExecutorRunner: Launch command: "/opt/bitnami/java/bin/java" "-cp" "/opt/bitnami/spark/conf/:/opt/bitnami/spark/jars/*" "-Xmx4096M" "-Dspark.driver.port=42013" "-Dspark.rpc.askTimeout=10s" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@spark-worker-0.spark-headless.redacted.svc.cluster.local:42013" "--executor-id" "4" "--hostname" "xx.xx.19.190" "--cores" "4" "--app-id" "app-20211115220744-0006" "--worker-url" "spark://[email protected]:34637"
21/11/15 22:07:48 INFO Worker: Asked to kill executor app-20211115220744-0006/4
21/11/15 22:07:48 INFO ExecutorRunner: Runner thread for executor app-20211115220744-0006/4 interrupted
21/11/15 22:07:48 INFO ExecutorRunner: Killing process!
21/11/15 22:07:48 INFO Worker: Executor app-20211115220744-0006/4 finished with state KILLED exitStatus 143
21/11/15 22:07:48 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 4
21/11/15 22:07:48 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20211115220744-0006, execId=4)
21/11/15 22:07:48 INFO ExternalShuffleBlockResolver: Application app-20211115220744-0006 removed, cleanupLocalDirs = true
21/11/15 22:07:48 INFO Worker: Cleaning up local directories for application app-20211115220744-0006
21/11/15 22:07:48 INFO Worker: Driver driver-20211115220742-0006 exited successfully
I can drop, change, or modify the run-example
or spark-submit
flags. It seemingly has no effect, even for something as simple as SparkPi 3
; the executors are killed and exit code 143 with very little information about why they were actually killed.
Resource constraints should not be an issue whatsoever here. This is a Kubernetes cluster of 3 AWS m5.4xlarge worker nodes, 16 vCPu and 64GiB RAM with very little else actually deployed on the. I have not set Kubernetes spec.resources
on limits
or requests
. The Spark cluster is deployed as follows:
argocd app create spark \
--repo https://charts.bitnami.com/bitnami \
--helm-chart spark \
--dest-server https://kubernetes.default.svc \
--insecure \
--helm-set 'worker.replicaCount=3' \
--dest-namespace redacted \
--revision '5.7.9' \
--helm-set worker.coreLimit=8 \
--helm-set worker.memoryLimit=8G \
--helm-set worker.daemonMemoryLimit=4G \
--helm-set master.daemonMemoryLimit=4G
argocd app sync spark
This uses the Spark Bitnami Helm chart and ArgoCD/Helm to deploy.
The cluster deploys just fine; I can see, for example, Starting Spark worker xxx.xx.xx.xx:46105 with 8 cores, 8.0 GiB RAM
and that all 3 workers have joined.
What am I missing here? How can I better debug this and figure out what the resource constraint is?
Funny enough, I can even run SparkPi locally. If I e.g. kubectl exec -it spark-worker-0 -- bash
:
$ ./bin/run-example SparkPi 3
...
21/11/15 22:22:09 INFO SparkContext: Running Spark version 3.2.0
...
21/11/15 22:22:11 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 0.634538 s
Pi is roughly 3.1437838126127087
Then I can add two arguments to run in cluster mode, and boom, executors get killed:
$ ./bin/run-example \
--master spark://spark-master-svc:7077 \
--deploy-mode cluster SparkPi
# Executor app-20211115222530-0008/2 finished with state KILLED exitStatus 143
CodePudding user response:
Learned a couple things here. The first is that 143 KILLED does not seem to actually be indicative of failure but rather of executors receiving a signal to shutdown once the job is finished. So, seems draconian when found in logs but is not.
What was confusing me was that I wasn't seeing any "Pi is roughly 3.1475357376786883" text on stdout/stderr. This led me to believe the computation never got that far, which was incorrect.
The issue here is what I was using --deploy-mode cluster
when --deploy-mode client
actually made a lot more sense in this situation. That is because I was running an ad-hoc container through kubectl run
which was not part of the existing deployment. This fits the definition of client mode better, since the submission does not come from an existing Spark worker. When running in --deploy-mode=cluster
, you'll never actually see stdout since input/output of the application are not attached to the console.
Once I changed --deploy-mode
to client
, I also needed to add --conf spark.driver.host
as documented here and here, for the pods to be able to resolve back to the invoking host.
kubectl run -n redacted spark-client --rm -it --restart='Never' \
--image docker.io/bitnami/spark:3.2.0-debian-10-r2 \
-- /bin/bash -c '
run-example \
--name my-pi-calc-example \
--master spark://spark-master-svc:7077 \
--deploy-mode client \
--conf spark.driver.host=$(hostname -i) \
SparkPi 10'
Output:
21/11/15 23:22:16 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
21/11/15 23:22:16 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 2.961188 s
Pi is roughly 3.140959140959141
21/11/15 23:22:16 INFO SparkUI: Stopped Spark web UI at http://xx.xx.xx.xx:4040
21/11/15 23:22:16 INFO StandaloneSchedulerBackend: Shutting down all executors
21/11/15 23:22:16 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
Interestingly enough this is app-20211115232213-0024
, which in the Spark Master UI still shows each worker as KILLED 143 - reinforcing the conclusion that that is a 'normal' shutdown signal.