I keep running into a java.lang.ClassNotFoundException: Failed to find data source: iceberg. Please find packages at https://spark.apache.org/third-party-projects.html
error.
I am trying to include the org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:1.1.0
package as part of my spark code. The reason is that I want it to write unit tests locally. I have tried several things:
- Include the package as part of my SparkSession builder:
val conf = new SparkConf()
conf.set("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:1.1.0")
val sparkSession: SparkSession =
SparkSession
.builder()
.appName(getClass.getSimpleName)
.config(conf = conf)
// ... the rest of my config
.master("local[*]").getOrCreate()
and it does not work, I get the same error. I also tried directly using the configuration string in the sparksession builder and that didn't work either.
- Downloading the jar myself. I really don't want to do this, I want it to be automated. But even this, I cannot specify "spark.jars" to point to the downloaded jar, it cannot find it for some reason.
Can anybody help me figure this out?
CodePudding user response:
You can create a uber/fat jar and put all your dependencies in that jar.
Lets say if you want to use iceberg in your spark application.
Create a pom.xml file and add the dependency in include section.
<dependencies>
<dependency>
<groupId>org.apache.iceberg</groupId>
<artifactId>iceberg-spark-runtime-3.2_2.12</artifactId>
<version>4.12</version>
</dependency>
</dependencies>
It will create a fat jar along with that dependency baked in it. you can deploy that jar via spark-submit and the dependent libraries will be picked automatically.
CodePudding user response:
It seems spark.jars.packages
is only read when spark-shell
starts up. That means it can be changed in the spark-shell session via SparkSession or SparkConf, however, it will not be processed or loaded.
For a Self-Contained Scala Application, you may used to add the following dependencies in the build.sbt:
libraryDependencies = Seq(
"org.mongodb.spark" %% "mongo-spark-connector" % "10.0.5",
"org.apache.spark" %% "spark-core" % "3.0.2",
"org.apache.spark" %% "spark-sql" % "3.0.2"
)