I try to read a table from databricks using the databricks jdbc driver and spark
df = spark.read().format("jdbc")
.option("url", databricks_url)
.option("driver", "com.databricks.client.spark.jdbc.SparkJDBC42Driver")
.option("dbtable", "default.age")
.load() ;
df.show();
The full stacktrace :
Exception in thread "main" java.lang.ClassCastException: com.databricks.client.spark.jdbc.SparkJDBC42Driver cannot be cast to java.sql.Driver
at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:54)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.$anonfun$driverClass$1(JDBCOptions.scala:101)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.$anonfun$driverClass$1$adapted(JDBCOptions.scala:101)
at scala.Option.foreach(Option.scala:437)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:101)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:39)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:33)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245)
at scala.Option.getOrElse(Option.scala:201)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
at SparkTest.main(SparkTest.java:31)
CodePudding user response:
I have solved this problème like that : By installing the ODBC https://www.databricks.com/fr/spark/odbc-drivers-download while keeping the jdbc jar in the libs, and by removing the driver option from spark read; and by applying some changes on the url :
- replacing spark by databricks
- adding these options :
Driver=Simba; UseNativeQuery=0;
String databricks_url = "jdbc:databricks://adb-xxxxx.azuredatabricks.net:443/default;
transportMode=http;
ssl=1;
httpPath=sql/protocolv1/o/....;
AuthMech=3;
UID=token;
PWD=<my databricks_token>;
Driver=Simba;
UseNativeQuery=0;
";
In pyspark the query myst be between simple quote : '
df = spark.read.format("jdbc").option("url", databricks_url) \
.option("query", 'SELECT * FROM table') \
.load()
df.show(vertical=True)