I am trying to join two Cassandra tables with:
t1.join(t2, Seq("some column"), "left")
I am getting the below error message:
Exception in thread "main" java.lang.ClassCastException: scala.Tuple8 cannot be cast to scala.Tuple7 at org.apache.spark.sql.cassandra.execution.CassandraDirectJoinStrategy.apply(CassandraDirectJoinStrategy.scala:27)
I am using cassandra v3.11.13 and Spark 3.3.0. The code dependencies:
libraryDependencies = Seq(
"org.scalatest" %% "scalatest" % "3.2.11" % Test,
"com.github.mrpowers" %% "spark-fast-tests" % "1.0.0" % Test,
"graphframes" % "graphframes" % "0.8.1-spark3.0-s_2.12" % Provided,
"org.rogach" %% "scallop" % "4.1.0" % Provided,
"org.apache.spark" %% "spark-sql" % "3.1.2" % Provided,
"org.apache.spark" %% "spark-graphx" % "3.1.2" % Provided,
"com.datastax.spark" %% "spark-cassandra-connector" % "3.2.0" % Provided)
Your help is greatly appreciated
CodePudding user response:
The Spark Cassandra connector does not support Apache Spark 3.3.0 yet and I suspect that is the reason it's not working though I haven't done any verification myself.
Support for Spark 3.3.0 has been requested in SPARKC-686 but the amount of work required is significant so stay tuned.
The latest supported Spark version is 3.2 using spark-cassandra-connector 3.2. Cheers!