Home > Software design >  cx_Oracle in Azure Databricks
cx_Oracle in Azure Databricks

Time:06-16

I am unable to establish connection to my Oracle database from Azure Databricks although it works in ADF where I am able to query the table. But ADF takes time to filter the records so I am still trying to connect from Databricks.

I followed the steps from Even-logs

Error message when I tried to establish the connection: DPI-1047: Cannot locate a 64-bit Oracle Client library: "/databricks/driver/oracle_ctl//lib/libclntsh.so: cannot open shared object file: No such file or directory".

When I executed the following command

dbutils.fs.ls("/databricks/driver/")

there was no such directory

This triggered me to post some questions here:

  1. Does this mean the init-script did not perform its job?

  2. Is /databricks/driver/oracle_ctl a hidden directory for dbutils.fs.ls?

  3. Error message points to /databricks/driver/oracle_ctl//lib/libclntsh.so, when I manually inspected the downloaded oracle client, there is no such folder called lib although libclntsh.so exists in the main directory. Is there a problem that databricks is checking the wrong directory for the libclntsh.so?

  4. Does this connections still works for others?

Syntax for connection: cx_Oracle.connect(user= user_name, password= password,dsn= IP ':' Port '/' DB_name)

Above syntax works fine when connected from inside a on-premises machine.

CodePudding user response:

You will get the above error if you don’t have Oracle instant client in your Cluster.

To resolve above error in azure databricks, please follow this code:

%sh

mkdir -p /opt/oracle
cd /opt/oracle
wget https://download.oracle.com/otn_software/nt/instantclient/19600/instantclient-basic-windows.x64-19.6.0.0.0dbru.zip
unzip instantclient-basic-windows.x64-19.6.0.0.0dbru.zip
set ORACLE_HOME=%ORABAS%\instantclient_19_3
set TNS_ADMIN=%ORACLE_HOME%
set PATH=%ORACLE_HOME%;%PATH%

Ref1

To create init script, use the following code:

As per official Ref2

To read data from oracle database in PySpark follow this article by Emrah Mete

For more information refer this official document:

https://docs.databricks.com/data/data-sources/oracle.html#oracle

CodePudding user response:

Try installing the latest major release of cx_Oracle - which got renamed to python-oracledb, see the release announcement.

This version doesn't need Oracle Instant Client. The API is the same as cx_Oracle, although obviously the name is different.

If I understand the instructions, your init script would do something like:

/databricks/python/bin/pip install oracledb

Application code would be like:

import oracledb

connection = oracledb.connect(user='scott', password=mypw, dsn='yourdbhostname/yourdbservicename')
with connection.cursor() as cursor:
  for row in cursor.execute('select city from locations'):
      print(row)

Resources:

Home page: oracle.github.io/python-oracledb/

Quick start: Quick Start python-oracledb Installation

Documentation: python-oracle.readthedocs.io/en/latest/index.html

PyPI: pypi.org/project/oracledb/

Source: github.com/oracle/python-oracledb

Upgrading: Upgrading from cx_Oracle 8.3 to python-oracledb

CodePudding user response:

Changed the path from "/databricks/driver/oracle_ctl/" to "/databricks/driver/oracle_ctl/instantclient" in the init-script and that error does not appear anymore.

Please use the following init script instead

dbutils.fs.put("dbfs:/databricks/<init-script-folder-name>/oracle_ctl.sh","""
#!/bin/bash
sudo apt-get install libaio1
wget --quiet -O /tmp/instantclient-basiclite-linuxx64.zip https://download.oracle.com/otn_software/linux/instantclient/instantclient-basiclite-linuxx64.zip
unzip /tmp/instantclient-basiclite-linuxx64.zip -d /databricks/driver/oracle_ctl/
mv /databricks/driver/oracle_ctl/instantclient* /databricks/driver/oracle_ctl/instantclient
sudo echo 'export LD_LIBRARY_PATH="/databricks/driver/oracle_ctl/instantclient/"' >> /databricks/spark/conf/spark-env.sh
sudo echo 'export ORACLE_HOME="/databricks/driver/oracle_ctl/instantclient/"' >> /databricks/spark/conf/spark-env.sh
""", True)

Notes:

  1. The above init-script was advised by a databricks employee and can be found here.

  2. As mentioned by Christopher Jones in one of the comments, cx_Oracle has been recently upgraded to oracledb with a thin and thick version.

  • Related