Home > Enterprise >  Spark 3.1.3 with spark cassandra connector 3.1.0 is giving ClassNotFoundException on spark-submit ev
Spark 3.1.3 with spark cassandra connector 3.1.0 is giving ClassNotFoundException on spark-submit ev

Time:07-07

I have used Spark 3.1.3 to connect Astra as well as Cassandra local server, but I am getting java.lang.ClassNotFoundException error on spark-submit. I have confirmed that same is happening with spark above 3.x, this code works fine with spark 2.4.2.

Here is my Main.scala:

import com.datastax.spark.connector.{toSparkContextFunctions}
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}


object Main {

  def main(args: Array[String]) = {
    val conf = new SparkConf()
      .set("spark.cassandra.connection.config.cloud.path","/path/astradb-secure-connect.zip")
      .set("spark.cassandra.auth.username","client-id")
      .set("spark.cassandra.auth.password","client-secret")
      .setAppName("SparkTest")
    val sparkCTX = new SparkContext(conf)

    val sparkSess = SparkSession.builder.appName("MyStream").getOrCreate()
    sparkCTX.setLogLevel("Error")
    println("\n\n\n\n************\n\n\n\n")

    val Rdd = sparkCTX.cassandraTable("my_keyspace", "accounts") **// Exact Line of Error**
    Rdd.foreach(s => {
      println(s)
    })
  }
}

My build.sbt looks like:

name := "Synchronization"

version := "1.0-SNAPSHOT"


scalaVersion := "2.12.15"

idePackagePrefix := Some("info.myapp.synchronization")


val sparkVersion = "3.1.3"

libraryDependencies   = Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion,
  "org.apache.spark" %% "spark-sql" % sparkVersion,
  "org.apache.spark" %% "spark-mllib" % sparkVersion,
  "org.apache.spark" %% "spark-streaming" % sparkVersion,
  "io.spray" %%  "spray-json" % "1.3.6",
  "org.scalaj" %% "scalaj-http" % "2.4.2"
)

libraryDependencies  = "com.datastax.spark" %% "spark-cassandra-connector" % "3.1.0" % "provided"

libraryDependencies  = "com.twitter" % "jsr166e" % "1.1.0"

libraryDependencies  = "net.liftweb" %% "lift-json" % "3.4.3"

libraryDependencies  = "com.sun.mail" % "javax.mail" % "1.6.2"

libraryDependencies  = "com.typesafe.akka" %% "akka-stream" % "2.5.22"

libraryDependencies  = "com.github.jurajburian" %% "mailer" % "1.2.4"

The code I am using to run this is:

$ sbt package
$ spark-submit --class "Main" --jars $(echo localDependencies/*.jar | tr ' ' ',') target/scala-2.12/*jar

My localDependency folder contains jars downloaded from mvnrepository and looks like:

enter image description here

My error goes like:

Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/spark/connector/CassandraRow
        at Main$.main(Main.scala:20)
        at Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.CassandraRow
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 14 more

Please help me fix the issue, or please provide the working versions combinations

CodePudding user response:

If you look into dependencies of the connector on Maven Repository, you will notice that it depends on the com.datastax.spark:spark-cassandra-connector-driver_2.12 that is missing in your list. To simplify submission it would be simpler to use spark-cassandra-connector-assembly that includes all necessary classes.

CodePudding user response:

I have found a way around the solution. The solution is to use sbt assembly instead of sbt package and solve merger conflicts. However, if there are any solutions any solution to use sbt package in this way, please add as an answer.

So, here is what I did:

In folder named project under project root directory, there should be a plugins.sbt file.

|project root
|---project
|     |----plugins.sbt    **// This file**
|     |----build.properties
|     |----target
|     |----project
|---src
|---target
|-----|----scala-2.1x
|-------------|---xxxx.jar     **// The built jar file**
|---build.sbt      **// Add merger conflict code in here**
|---default.properties

In plugins.sbt add the line:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.15.0")

Later, in build.sbt in the project root directory, add this line to remove merger conflicts:

assemblyJarName in assembly := "my-built-fatjar-1.0.jar"
assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
}

Note: 1. This merger conflict may be marked as syntax error by the editor. Also, having this merger code may stop the model from getting compiled with sbt package by showing error. But this works with sbt assembly. And the output jar would be present in project root/target/scala-2.1x/my-built-fatjar-1.0.jar . 2. sbt assembly takes several times longer than sbt package to build the jar. Therefore, for development purpose, use sbt ~assembly and this would run the build in continuous mode and the compiler would continuously look for any change in the code and it will update the jar almost instantly with all changes in the source codes.

  • Related