Home > Back-end >  Delta lake error on DeltaTable.forName in k8s cluster mode cannot assign instance of java.lang.invok
Delta lake error on DeltaTable.forName in k8s cluster mode cannot assign instance of java.lang.invok

Time:10-08

I am trying to merge some data to delta table in a streaming application in k8s using spark submit in cluster mode

Getting the below error, But its works fine in k8s local mode and in my laptop, none of the operations related to delta lake is working in k8s cluster mode,

Below is the library versions i am using , is it some compatibility issue,

SPARK_VERSION_DEFAULT=3.3.0
HADOOP_VERSION_DEFAULT=3
HADOOP_AWS_VERSION_DEFAULT=3.3.1
AWS_SDK_BUNDLE_VERSION_DEFAULT=1.11.974

below is the error message

py4j.protocol.Py4JJavaError: An error occurred while calling o128.saveAsTable. : java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 4) (192.168.15.250 executor 2): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF

CodePudding user response:

scala version : 2.12.11

spark : 2.4.6

delta-core_2.12 : 0.7.0

delta-sql_2.12 : 0.7.0

hadoop-azure : 2.7.6

hadoop-azure-datalake : 3.0.0

org.apache.spark.sql.AnalysisException: Datasource does not support the operation;

Caused by: org.apache.spark.sql.AnalysisException: Datasource does not support the operation;

at org.apache.spark.sql.catalyst.analysis.UnresolvedRelation.$anonfun$checkOperationsSupport$1(UnresolvedRelation.scala:269)

at org.apache.spark.sql.catalyst.analysis.UnresolvedRelation.$anonfun$checkOperationsSupport$1$adapted(UnresolvedRelation.scala:268)

at scala.collection.Iterator.foreach(Iterator.scala:929)

at scala.collection.Iterator.foreach$(Iterator.scala:929)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1406)

at org.apache.spark.sql.catalyst.analysis.UnresolvedRelation.checkOperationsSupport(UnresolvedRelation.scala:268)

at org.apache.spark.sql.catalyst.analysis.UnresolvedRelation.$anonfun$resolveOperations$1(UnresolvedRelation.scala:275)

at org.apache.spark.sql.catalyst.analysis.

CodePudding user response:

Finaly able to resolve this issue , issue was due to some reason dependant jars like delta, kafka are not available in executor , as per the below SO response

cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.sql.execution.datasources.v2.DataSourceRDD

i have added the jars in spark/jars folder using docker image and issue got resolved ,

  • Related