we are trying to install kubernetes spark opeartor and write one sample sparkapplication to connect to s3 and write a file. However whatever we do, we aren't able to get rid of the below error:
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
23/01/15 15:00:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.NoSuchMethodError: 'char[] org.apache.hadoop.conf.Configuration.getPassword(java.lang.String)'
at org.apache.spark.SSLOptions$.$anonfun$parse$8(SSLOptions.scala:188)
at scala.Option.orElse(Option.scala:447)
at org.apache.spark.SSLOptions$.parse(SSLOptions.scala:188)
at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:98)
at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:368)
at org.apache.spark.deploy.SparkSubmit.secMgr$1(SparkSubmit.scala:368)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$8(SparkSubmit.scala:376)
at scala.Option.map(Option.scala:230)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:376)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Spark application creation process for Spark operator:
- Created the base image for spark
$ cd spark-3.1.1-bin-hadoop3.2
$ ./bin/docker-image-tool.sh -r <registryurl>/nks/sparkoperator/base -t 3.1.1 -u 1000 -b java_image_tag=11-jre-slim build
This created the base image and the same has been pushed to the artifactory
<registryurl>/nks/sparkoperator/base/spark:3.1.1
- create folders, Dockerfile, build.sbt, app file for the actual application
.
├── Dockerfile
├── build.sbt
├── plugins.sbt
└── src
└── main
└── scala
└── com
└── company
└── xyz
└── ParquetAWSExample.scala
Dockerfile
FROM <registryurl>/nks/sparkoperator/base/spark:3.1.1
USER root
RUN apt -y install wget
ARG SBT_VERSION
ENV SBT_VERSION=${SBT_VERSION:-1.5.1}
RUN wget -O - https://github.com/sbt/sbt/releases/download/v${SBT_VERSION}/sbt-${SBT_VERSION}.tgz | gunzip | tar -x -C /usr/local
#WORKDIR /spark
ENV PATH /usr/local/sbt/bin:${PATH}
WORKDIR /app
COPY . /app
ADD plugins.sbt /app/project/
RUN sbt update
RUN sbt clean assembly
This builds the dockerimage - <registryurl>/nks/testsparkoperatorv2/s3conn:1.5
build.sbt
name := "xyz"
version := "0.1"
scalaVersion := "2.12.11"
libraryDependencies = Seq(
"org.apache.spark" %% "spark-core" % "3.1.1" ,
"org.apache.spark" %% "spark-sql" % "3.1.1",
"org.apache.hadoop" % "hadoop-aws" % "3.1.1",
)
dependencyOverrides = "org.apache.hadoop" % "hadoop-common" % "3.1.1"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
spark-application.yaml
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: parquet.test
namespace: spark-operator
spec:
type: Scala
mode: cluster
image: "<registryurl>/nks/testsparkoperatorv2/s3conn:1.5"
imagePullPolicy: Always
imagePullSecrets:
- myregistrykey
mainClass: com.company.xyz.ParquetAWSExample
mainApplicationFile: "local:///app/target/scala-2.12/xyz-assembly-0.1.jar"
sparkVersion: "3.1.1"
driver:
memory: 512m
labels:
version: 3.1.1
serviceAccount: sparkoperator
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
executor:
cores: 1
instances: 1
memory: 512m
labels:
version: 3.1.1
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
CodePudding user response:
Seem like version mismatch between spark and hadoop, Spark 3.1.1 is compatible with Hadoop 3.2 or higher. It is recommended to use the latest version of Hadoop for optimal performance and security.
try this build.sbt file:
name := "xyz"
version := "0.1"
scalaVersion := "2.12.11"
libraryDependencies = Seq(
"org.apache.spark" %% "spark-core" % "3.1.1" ,
"org.apache.spark" %% "spark-sql" % "3.1.1",
"org.apache.hadoop" % "hadoop-aws" % "3.2.0",
)
dependencyOverrides = "org.apache.hadoop" % "hadoop-common" % "3.2.0"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
Also maybe try to add this dependency:
"org.apache.hadoop" % "hadoop-hdfs-client" % "3.2.0",
CodePudding user response:
Found the solution. Actually even I was running the below command from within spark-3.1.1-bin-hadoop3.2
folder to build the base image.
$ ./bin/docker-image-tool.sh \
-r <registryurl>/nks/sparkoperator/base \
-t 3.1.1 \
-u 1000 \
-b java_image_tag=11-jre-slim build
It used the default spark installed on my system, which was older, containing kubernetes-client
jar of version 5.4.1 which wasn't compatible with our Kubernetes version (1.22)
Therefore, I set the SPARK_HOME to /spark-3.1.1-bin-hadoop3.2
and built the base image, and everything worked afterward.