Home > Blockchain >  Connect to databricks sql using spark and databricks jdbc
Connect to databricks sql using spark and databricks jdbc

Time:08-17

I try to read a table from databricks using the databricks jdbc driver and spark

df = spark.read().format("jdbc")
        .option("url", databricks_url)
        .option("driver", "com.databricks.client.spark.jdbc.SparkJDBC42Driver")
        .option("dbtable", "default.age")
        .load() ;
        df.show();

The full stacktrace :

Exception in thread "main" java.lang.ClassCastException: com.databricks.client.spark.jdbc.SparkJDBC42Driver cannot be cast to java.sql.Driver
    at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:54)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.$anonfun$driverClass$1(JDBCOptions.scala:101)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.$anonfun$driverClass$1$adapted(JDBCOptions.scala:101)
    at scala.Option.foreach(Option.scala:437)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:101)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:39)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:33)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245)
    at scala.Option.getOrElse(Option.scala:201)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
    at SparkTest.main(SparkTest.java:31)

CodePudding user response:

I have solved this problème like that : By installing the ODBC https://www.databricks.com/fr/spark/odbc-drivers-download while keeping the jdbc jar in the libs, and by removing the driver option from spark read; and by applying some changes on the url :

  • replacing spark by databricks
  • adding these options :

Driver=Simba; UseNativeQuery=0;

String databricks_url = "jdbc:databricks://adb-xxxxx.azuredatabricks.net:443/default;
transportMode=http;
ssl=1;
httpPath=sql/protocolv1/o/....;
AuthMech=3;
UID=token;
PWD=<my databricks_token>;
Driver=Simba;
UseNativeQuery=0;
";

In pyspark the query myst be between simple quote : '

df = spark.read.format("jdbc").option("url", databricks_url) \
.option("query", 'SELECT * FROM table') \
.load()
df.show(vertical=True)
  • Related