I'm trying to submit around 20 spark applications at once. This causes most of them to fail. How do I stop this from happening? The spark-operator pods are not going out of memory. The CPU does increase, but it is for a very short period. The spark-operator pod doesn't restart because of these jobs.
Logs -
10 controller.go:184] SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 was added, enqueuing it for submission
10 controller.go:184] SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 was added, enqueuing it for submission
10 controller.go:184] SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 was added, enqueuing it for submission
10 controller.go:184] SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 was added, enqueuing it for submission
10 controller.go:263] Starting processing key: "spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1"
10 sparkui.go:282] Creating a service sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1-ui-svc for the Spark UI for application sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1
10 event.go:282] Event(v1.ObjectReference{Kind:"SparkApplication", Namespace:"spark", Name:"sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1", UID:"3867b989-71e6-4e47-88e9-e9d88618e269", APIVersion:"sparkoperator.k8s.io/v1beta2", ResourceVersion:"380961510", FieldPath:""}): type: 'Normal' reason: 'SparkApplicationAdded' SparkApplication sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 was added, enqueuing it for submission
10 controller.go:184] SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 was added, enqueuing it for submission
10 sparkui.go:148] Creating an Ingress sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1-ui-ingress for the Spark UI for application sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1
10 submission.go:65] spark-submit arguments: [/opt/spark/bin/spark-submit --class xyz --master ... ]
10 controller.go:728] failed to run spark-submit for SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1: failed to run spark-submit for SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1: WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/08/30 19:41:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/08/30 19:41:08 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
22/08/30 19:41:36 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
I0830 19:42:00.711350 10 controller.go:822] Update the status of SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 from:
{
"lastSubmissionAttemptTime": null,
"terminationTime": null,
"driverInfo": {},
"applicationState": {
"state": ""
}
}
to:
{
"lastSubmissionAttemptTime": "2022-08-30T19:42:00Z",
"terminationTime": null,
"driverInfo": {},
"applicationState": {
"state": "SUBMISSION_FAILED",
"errorMessage": "failed to run spark-submit for SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1: WARNING: An illegal reflective access operation has occurred\nWARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)\nWARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform\nWARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations\nWARNING: All illegal access operations will be denied in a future release\n22/08/30 19:41:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\nUsing Spark's default log4j profile: org/apache/spark/log4j-defaults.properties\n22/08/30 19:41:08 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file\n22/08/30 19:41:36 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.\n"
},
"submissionAttempts": 1
}
I0830 19:42:00.712173 10 event.go:282] Event(v1.ObjectReference{Kind:"SparkApplication", Namespace:"spark", Name:"sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1", UID:"3867b989-71e6-4e47-88e9-e9d88618e269", APIVersion:"sparkoperator.k8s.io/v1beta2", ResourceVersion:"380961510", FieldPath:""}): type: 'Warning' reason: 'SparkApplicationSubmissionFailed' failed to submit SparkApplication sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1: failed to run spark-submit for SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1: WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/08/30 19:41:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/08/30 19:41:08 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
22/08/30 19:41:36 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
I0830 19:42:00.723920 10 controller.go:223] SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 was updated, enqueuing it
I0830 19:42:00.724098 10 controller.go:223] SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 was updated, enqueuing it
I0830 19:42:00.724154 10 controller.go:223] SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 was updated, enqueuing it
I0830 19:42:00.724353 10 controller.go:223] SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 was updated, enqueuing it
I0830 19:42:00.811873 10 controller.go:223] SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 was updated, enqueuing it
I0830 19:42:00.812538 10 controller.go:270] Ending processing key: "spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1"
I0830 19:42:00.812567 10 controller.go:263] Starting processing key: "spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1"
I0830 19:42:00.812839 10 controller.go:822] Update the status of SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 from:
{
"lastSubmissionAttemptTime": "2022-08-30T19:42:00Z",
"terminationTime": null,
"driverInfo": {},
"applicationState": {
"state": "SUBMISSION_FAILED",
"errorMessage": "failed to run spark-submit for SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1: WARNING: An illegal reflective access operation has occurred\nWARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)\nWARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform\nWARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations\nWARNING: All illegal access operations will be denied in a future release\n22/08/30 19:41:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\nUsing Spark's default log4j profile: org/apache/spark/log4j-defaults.properties\n22/08/30 19:41:08 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file\n22/08/30 19:41:36 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.\n"
},
"submissionAttempts": 1
}
to:
{
"lastSubmissionAttemptTime": "2022-08-30T19:42:00Z",
"terminationTime": null,
"driverInfo": {},
"applicationState": {
"state": "FAILED",
"errorMessage": "failed to run spark-submit for SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1: WARNING: An illegal reflective access operation has occurred\nWARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)\nWARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform\nWARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations\nWARNING: All illegal access operations will be denied in a future release\n22/08/30 19:41:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\nUsing Spark's default log4j profile: org/apache/spark/log4j-defaults.properties\n22/08/30 19:41:08 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file\n22/08/30 19:41:36 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.\n"
},
"submissionAttempts": 1
}
I0830 19:42:00.813582 10 event.go:282] Event(v1.ObjectReference{Kind:"SparkApplication", Namespace:"spark", Name:"sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1", UID:"3867b989-71e6-4e47-88e9-e9d88618e269", APIVersion:"sparkoperator.k8s.io/v1beta2", ResourceVersion:"380963223", FieldPath:""}): type: 'Warning' reason: 'SparkApplicationFailed' SparkApplication sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 failed: failed to run spark-submit for SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1: WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/08/30 19:41:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/08/30 19:41:08 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
22/08/30 19:41:36 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
I0830 19:42:00.824101 10 controller.go:223] SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 was updated, enqueuing it
I0830 19:42:00.824213 10 controller.go:223] SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 was updated, enqueuing it
I0830 19:42:00.824904 10 controller.go:223] SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 was updated, enqueuing it
I0830 19:42:00.824802 10 controller.go:223] SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 was updated, enqueuing it
I0830 19:42:01.011831 10 controller.go:270] Ending processing key: "spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1"
I0830 19:42:01.011938 10 controller.go:223] SparkApplication spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1 was updated, enqueuing it
I0830 19:42:01.011995 10 controller.go:263] Starting processing key: "spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1"
I0830 19:42:01.012207 10 controller.go:270] Ending processing key: "spark/sch-3a44a9db-7993-413e-2022-08-29t18-30-00tz00-00-1"
CodePudding user response:
The issue was that the CPU/memory was not enough for the spark operator pod. For each submissions, a JVM is created inside the spark-operator pod. If it does not have enough resources, it will kill these JVMs, resulting in failed spark-submits.
Fixed this by simply removing the limits on CPU and memory in the helm chart.
The chart mentions the issue here -
# Note, that each job submission will spawn a JVM within the Spark Operator Pod using "/usr/local/openjdk-11/bin/java -Xmx128m".
# Kubernetes may kill these Java processes at will to enforce resource limits. When that happens, you will see the following error:
# 'failed to run spark-submit for SparkApplication [...]: signal: killed' - when this happens, you may want to increase memory limits.
resources: {}
# limits:
# cpu: 100m
# memory: 300Mi
# requests:
# cpu: 100m
# memory: 300Mi
Even though it mentions that it will assign a JVM of 128m, the actual memory used for about 20 applications was only around 400mb. The CPU usage was about 1.5 cores.