Home > database >  java.lang.NoClassDefFoundError: scala/Product$class using read function from PySpark
java.lang.NoClassDefFoundError: scala/Product$class using read function from PySpark

Time:12-02

I'm new to PySpark, and I'm just trying to read a table from my redshift bank.

The code looks like the following:

import findspark
findspark.add_packages("io.github.spark-redshift-community:spark-redshift_2.11:4.0.1")
findspark.init()    
spark = SparkSession.builder.appName("Dim_Customer").getOrCreate()
df_read_1 = spark.read \
    .format("io.github.spark_redshift_community.spark.redshift") \
    .option("url", "jdbc:redshift://fake_ip:5439/fake_database?user=fake_user&password=fake_password") \
    .option("dbtable", "dim_customer") \
    .option("tempdir", "https://bucket-name.s3.region-code.amazonaws.com/") \
.load()

I'm getting the error: java.lang.NoClassDefFoundError: scala/Product$class

I'm using Spark version 3.2.2 with Python 3.9.7

Could someone help me, please? Thank you in advance!

CodePudding user response:

You're using wrong version of the spark-redshift connector - your version is for Spark 2.4 that uses Scala 2.11, while you need version for Spark 3 that uses Scala 2.12 - change version to 5.1.0 that was released recently (all released versions are listed here)

  • Related