I'm trying to connect from an instance A in databricks to another databricks instance B to read its tables and schemas (databases)
How can I fetch the schemas from instance B using jdbc?
This is what I've tried:
sparkUrl = f"jdbc:{UrlPrefix}://{Host}:{Port}/default;transportMode=http;ssl=1;httpPath={Db};AuthMech=3;UID={Uid};PWD={Pwd};UseNativeQuery=1"
spark.read.jdbc( url=sparkUrl, table=f'(SHOW SCHEMAS)')
But the system always return an error:
java.sql.SQLException: [Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: Table or view not found: SHOW; line 1 pos 15
If I run a "normal" SELECT statement from instance A against instance B tables it returns the records without errors.
I read in the documentation that the jdbc driver always tries to surround the sql statement like
select * from (query)
It seems that this is the behavior that is happening, hence that's how I get the error.
Can I stop the default behavior of the jdbc driver to wrap jdbc queries so I can run the straight code in instance B?
I can switch to scala if such option is available.
CodePudding user response:
I was facing the same issue, the 'show' keyword was not recognizable so used alternative methods to get the same. I am using the databricks-sql-connector provided by databricks to query another instance. I am providing a detailed link for reference - https://docs.databricks.com/dev-tools/python-sql-connector.html
- You need to install the connector on the instance
- generate access_token to access the cluster
- you can get all the hostname and path details on the cluster jdbc/odbc tab.
from databricks import sql
with sql.connect(server_hostname="databricks.com",
http_path="sql/protocolv1/o/0/2",
access_token="asdjaskdjaskXXXX") as connection:
with connection.cursor() as cursor:
cursor.execute("SHOW SCHEMAS ")
result = cursor.fetchall()
for row in result:
print(row)