Context
Spark reader has the function format
, which is used to specify a data source type, for example, JSON
, CSV
or third party com.databricks.spark.redshift
Help
how can I check whether a third-party format exists or not, let me give a case
- In local spark, connect to redshift two open source libs available 1.
com.databricks.spark.redshift
2.io.github.spark_redshift_community.spark.redshift
, how I can determine which libs the user pastes in the classpath
What I tried
- Class.forName("com.databricks.spark.redshift"), not worked
- I tried to check spark code for how they are throwing error, here is line, but unfortunately Utils is not available publically
- Instead of targeting format option, I tried to target JAR file
System.getProperty("java.class.path")
spark.read.format("..").load()
in try/catch
I looking for a proper & reliable solution
CodePudding user response:
May this answer help you.
To only check whether is spark format exists or not,
spark.read.format("..").load() in try/catch
is enough.
And as all data sources usually register themselves using DataSourceRegister
interface (and use shortName to provide their alias):
You can use Java's ServiceLoader.load
method to find all registered implementations of DataSourceRegister
interface.
import java.util.ServiceLoader
import org.apache.spark.sql.sources.DataSourceRegister
val formats = ServiceLoader.load(classOf[DataSourceRegister])
import scala.collection.JavaConverters._
formats.asScala.map(_.shortName).foreach(println)