I am following the tutorial here on how to access delta lake house with spark, but can't seem to get it to work.
I have the following dependencies:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>Runner</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-core_2.12</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.2.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.2.3</version>
</dependency>
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-core_2.12</artifactId>
<version>2.2.0</version>
</dependency>
</dependencies>
</project>
And my code is:
package org.example;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
public class Main {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL basic example")
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
.config("spark.master", "local")
.getOrCreate();
Dataset<Row> df = spark.range(0, 5).toDF();
df.write().format("delta").save("./tmp/delta-table");
df.show();
}
}
But when I run this, it fails with the following error:
Exception in thread "main" java.lang.NoSuchMethodError: 'org.apache.spark.internal.config.ConfigEntry org.apache.spark.sql.internal.SQLConf$.PARQUET_FIELD_ID_READ_ENABLED()'
at io.delta.sql.DeltaSparkSessionExtension.$anonfun$apply$3(DeltaSparkSessionExtension.scala:88)
at org.apache.spark.sql.SparkSessionExtensions.$anonfun$buildResolutionRules$1(SparkSessionExtensions.scala:174)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at org.apache.spark.sql.SparkSessionExtensions.buildResolutionRules(SparkSessionExtensions.scala:174)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.customResolutionRules(BaseSessionStateBuilder.scala:212)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1.<init>(BaseSessionStateBuilder.scala:187)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.analyzer(BaseSessionStateBuilder.scala:179)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.$anonfun$build$2(BaseSessionStateBuilder.scala:357)
at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:87)
at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:87)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:75)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:75)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:65)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:205)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:211)
at org.apache.spark.sql.SparkSession.range(SparkSession.scala:550)
at org.apache.spark.sql.SparkSession.range(SparkSession.scala:529)
at org.example.Main.main(Main.java:16)
Googling for how to fix did not yield much result. Any ideas?
CodePudding user response:
You're using io.delta.delta-core_2.12.2.2.0
with spark-core version 3.2.0
, but its dependency is spark-core version 3.3.1
.
In your case, it's looking for the PARQUET_FIELD_ID_READ_ENABLED
configuration parameter, but this has only been introduced in version 3.3.0
. That's causing your issue.
Try using Spark version 3.3.1
:)