Home > Back-end >  Can you fetch schemas from spark databases via jdbc?
Can you fetch schemas from spark databases via jdbc?

Time:03-24

I'm trying to connect from an instance A in databricks to another databricks instance B to read its tables and schemas (databases)

How can I fetch the schemas from instance B using jdbc?

This is what I've tried:

sparkUrl = f"jdbc:{UrlPrefix}://{Host}:{Port}/default;transportMode=http;ssl=1;httpPath={Db};AuthMech=3;UID={Uid};PWD={Pwd};UseNativeQuery=1"
spark.read.jdbc( url=sparkUrl, table=f'(SHOW SCHEMAS)') 

But the system always return an error:

java.sql.SQLException: [Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: Table or view not found: SHOW; line 1 pos 15

If I run a "normal" SELECT statement from instance A against instance B tables it returns the records without errors.

I read in the documentation that the jdbc driver always tries to surround the sql statement like

select * from (query)

It seems that this is the behavior that is happening, hence that's how I get the error.

Can I stop the default behavior of the jdbc driver to wrap jdbc queries so I can run the straight code in instance B?

I can switch to scala if such option is available.

CodePudding user response:

I was facing the same issue, the 'show' keyword was not recognizable so used alternative methods to get the same. I am using the databricks-sql-connector provided by databricks to query another instance. I am providing a detailed link for reference - https://docs.databricks.com/dev-tools/python-sql-connector.html

  1. You need to install the connector on the instance
  2. generate access_token to access the cluster
  3. you can get all the hostname and path details on the cluster jdbc/odbc tab.
from databricks import sql
with sql.connect(server_hostname="databricks.com",
                 http_path="sql/protocolv1/o/0/2",
                 access_token="asdjaskdjaskXXXX") as connection:
    with connection.cursor() as cursor:
        cursor.execute("SHOW SCHEMAS ")
        result = cursor.fetchall()

        for row in result:
          print(row)
  • Related