Home > Software engineering >  Error while using Crealytics package to read Excel file
Error while using Crealytics package to read Excel file

Time:03-03

I'm trying to read an Excel file from HDFS location using Crealytics package and keep getting an error (Caused by: java.lang.ClassNotFoundException:org.apache.spark.sql.connector.catalog.TableProvider). My code is below. Any tips? When running the below code, the spark session initiates fine and the Crealytics package loads without error. The error only comes when running the "spark.read" code. The file location I'm using is accurate.

def spark_session(spark_conf):

    conf = SparkConf()
    for (key, val) in spark_conf.items():
        conf.set(key, val)

    spark = SparkSession \
        .builder \
        .enableHiveSupport() \
        .config(conf=conf) \
        .getOrCreate()
    return spark

spark_conf = {"spark.executor.memory": "16g", 
              "spark.yarn.executor.memoryOverhead": "3g",
              "spark.dynamicAllocation.initialExecutors": 2,
              "spark.driver.memory": "16g", 
              "spark.kryoserializer.buffer.max": "1g",
              "spark.driver.cores": 32,
              "spark.executor.cores": 8,
              "spark.yarn.queue": "adhoc",
              "spark.app.name": "CDSW_basic",
              "spark.dynamicAllocation.maxExecutors": 32,
              "spark.jars.packages": "com.crealytics:spark-excel_2.12:0.14.0"
             }


df = spark.read.format("com.crealytics.spark.excel") \
          .option("useHeader", "true") \
          .load("/user/data/Block_list.xlsx")

I've also tried loading it outside of the session function with the code below yielding the same error once I try to read the file.

crealytics_driver_loc = "com.crealytics:spark-excel_2.12:0.14.0"
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages '   crealytics_driver_loc   ' pyspark-shell'

CodePudding user response:

Looks like I'm answering my own question. After a great deal of fiddling around, I've found that using an old version of crealytics works with my setup, though I'm uncertain why. The package that worked was version 13 ("com.crealytics:spark-excel_2.12:0.13.0"), though the newest is version 15.

  • Related