Home > other >  Databricks SQL doesn't seem to support SQL Server
Databricks SQL doesn't seem to support SQL Server

Time:05-17

I've created some hive tables using a JDBC in a python notebook on Databricks. This was on Data Science and Engineering UI. I'm able to query the tables in a Databricks Notebook and user direct SQL with the magic command %

When switching to Databricks SQL UI, I'm still able to see the tables in Hive metastore explorer. However I'm not able to read the data. A very clear message says that only csv, parquet and so are supported.

Even though, I found this surprising, since I can use the data on DS and Engineering UI why it's not the case on Databricks SQL? Is there any solution to overcome that?

CodePudding user response:

Yes, it's a known limitation that Databricks SQL right now supports only file-based formats. As I remember it's related to a security model, plus the fact that DBSQL is using Photon under the hood where JDBC integration could be not so performant. You may reach your solution architect or customer success engineer to get information on if it will be supported in the future.

The current workaround would be only to have a job that will periodically read all data from database via JDBC and dump into Delta table - it could be even more performant compared to JDBC, the only issue is the freshness of data.

CodePudding user response:

You can import a Hive table from cloud storage into Databricks using an external table and query it using Databricks SQL.

Step 1: Show the CREATE TABLE statement

Issue a SHOW CREATE TABLE <tablename> command on your Hive command line to see the statement that created the table.

Refer below example:

hive> SHOW CREATE TABLE wikicc;
OK
CREATE  TABLE `wikicc`(
  `country` string,
  `count` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/user/hive/warehouse/wikicc'
TBLPROPERTIES (
  'totalSize'='2335',
  'numRows'='240',
  'rawDataSize'='2095',
  'COLUMN_STATS_ACCURATE'='true',
  'numFiles'='1',
  'transient_lastDdlTime'='1418173653')

Step 2: Issue a CREATE EXTERNAL TABLE statement

If the statement that is returned uses a CREATE TABLE command, copy the statement and replace CREATE TABLE with CREATE EXTERNAL TABLE.

  • EXTERNAL ensures that Spark SQL does not delete your data if you drop the table.

  • You can omit the TBLPROPERTIES field.

DROP TABLE wikicc

CREATE EXTERNAL TABLE `wikicc`(
  `country` string,
  `count` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/user/hive/warehouse/wikicc'

Step 3: Issue SQL commands on your data

SELECT * FROM wikicc

Source: https://docs.databricks.com/data/data-sources/hive-tables.html

  • Related