Home > Blockchain >  check whether is spark format exists or not
check whether is spark format exists or not

Time:12-14

Context

Spark reader has the function format, which is used to specify a data source type, for example, JSON, CSV or third party com.databricks.spark.redshift

Help

how can I check whether a third-party format exists or not, let me give a case

  • In local spark, connect to redshift two open source libs available 1. com.databricks.spark.redshift 2. io.github.spark_redshift_community.spark.redshift, how I can determine which libs the user pastes in the classpath

What I tried

  • Class.forName("com.databricks.spark.redshift"), not worked
  • I tried to check spark code for how they are throwing error, here is line, but unfortunately Utils is not available publically
  • Instead of targeting format option, I tried to target JAR file System.getProperty("java.class.path")
  • spark.read.format("..").load() in try/catch

I looking for a proper & reliable solution

CodePudding user response:

May this answer help you.

To only check whether is spark format exists or not,

spark.read.format("..").load() in try/catch

is enough.

And as all data sources usually register themselves using DataSourceRegister interface (and use shortName to provide their alias):

You can use Java's ServiceLoader.load method to find all registered implementations of DataSourceRegister interface.

import java.util.ServiceLoader
import org.apache.spark.sql.sources.DataSourceRegister

val formats = ServiceLoader.load(classOf[DataSourceRegister])

import scala.collection.JavaConverters._
formats.asScala.map(_.shortName).foreach(println)
  • Related