I am trying to determine why a query that returns CLOB data runs so much faster using python3 and cx_Oracle in comparison to PHP 7.4 with OCI8.
Oracle Client Libraries version is 19.5.0.0.0. Queries are ran on the same client and against the same database using the same user. See below for the test PHP and Python scripts. The PHP script takes about 16 seconds or so to retrieve the result set of 220 rows. The Python script takes all of 0.2 seconds to run and print the result set to the console. Is there some kind of client side caching occurring with the python cx_Oracle package? Or is the difference in the outputtypehandler to retrieve the CLOB as a string compared to how the PHP extension retrieves it as a string? Possibly the cx_Oracle package takes advantage of the LOB prefetching with Oracle 12.2 and greater?
<?php
// Create connection to Oracle, change HOST IP and SID string!
$conn = oci_connect("user", "password", 'host');
print oci_client_version() . PHP_EOL;
if (!$conn) {
$m = oci_error();
echo $m['message'] . PHP_EOL;
exit;
}
else {
print "Oracle database connection online" . PHP_EOL;
}
$stid = oci_parse($conn, "SELECT
EMS.MODULE_NAME,
EM.MAINT_TITLE,
EM.MAINT_VALIDATION,
CASE EMS.DATA_TYPE WHEN 8 THEN 'Logic' WHEN 9 THEN 'Action' END AS DATA_TYPE,
EM.MAINT_AUTHOR,
-- Return all three of the below to speed up the report as LOB retrieval is slow
-- CAST(EMS.EKM_INFO AS VARCHAR2(4000)) AS CLOB_VARCHAR,
-- DBMS_LOB.GETLENGTH(EMS.EKM_INFO) AS CLOB_LENGTH,
EMS.EKM_INFO AS CLOB
FROM
EKS_MODULE EM,
EKS_MODULESTORAGE EMS
WHERE
EMS.MODULE_NAME = EM.MODULE_NAME
AND EMS.VERSION = EM.VERSION
AND EMS.DATA_TYPE IN (8, 9)
AND EMS.EKM_INFO LIKE '%@PAGER%' -- This is a CLOB field so we can filter out anything that doesn't actually use @PAGER
AND EM.ACTIVE_FLAG = 'A'
ORDER BY
EM.MAINT_VALIDATION DESC, EMS.MODULE_NAME");
list($usec, $sec) = explode(" ", microtime());
$start = (float)$usec (float)$sec;
oci_execute($stid);
$num_results = oci_fetch_all($stid, $results);
list($usec, $sec) = explode(" ", microtime());
$end = (float)$usec (float)$sec;
print ($end - $start) . ' seconds' . PHP_EOL;
print $num_results . PHP_EOL;
?>
import cx_Oracle
import time
from pprint import pprint
def output_type_handler(cursor, name, default_type, size, precision, scale):
if default_type == cx_Oracle.DB_TYPE_CLOB:
return cursor.var(cx_Oracle.DB_TYPE_LONG, arraysize=cursor.arraysize)
if default_type == cx_Oracle.DB_TYPE_BLOB:
return cursor.var(cx_Oracle.DB_TYPE_LONG_RAW, arraysize=cursor.arraysize)
print(cx_Oracle.clientversion())
connection = cx_Oracle.connect(
user="user",
password="password",
dsn="server")
print("Successfully connected to Oracle Database")
cursor = connection.cursor()
sql = """
SELECT
EMS.MODULE_NAME,
EM.MAINT_TITLE,
EM.MAINT_VALIDATION,
CASE EMS.DATA_TYPE WHEN 8 THEN 'Logic' WHEN 9 THEN 'Action' END AS DATA_TYPE,
EM.MAINT_AUTHOR,
-- Return all three of the below to speed up the report as LOB retrieval is slow
-- CAST(EMS.EKM_INFO AS VARCHAR2(4000)) AS CLOB_VARCHAR,
-- DBMS_LOB.GETLENGTH(EMS.EKM_INFO) AS CLOB_LENGTH,
EMS.EKM_INFO AS CLOB
FROM
EKS_MODULE EM,
EKS_MODULESTORAGE EMS
WHERE
EMS.MODULE_NAME = EM.MODULE_NAME
AND EMS.VERSION = EM.VERSION
AND EMS.DATA_TYPE IN (8, 9)
AND EMS.EKM_INFO LIKE '%@PAGER%' -- This is a CLOB field so we can filter out anything that doesn't actually use @PAGER
AND EM.ACTIVE_FLAG = 'A'
ORDER BY
EM.MAINT_VALIDATION DESC, EMS.MODULE_NAME
"""
start = time.time()
connection.outputtypehandler = output_type_handler
cursor.execute(sql)
rows = cursor.fetchall()
stop = time.time()
pprint(rows)
print(f"Elapsed time: {stop - start}")
CodePudding user response:
A big difference is the output type handler in the Python example. (But note that this will only allow LOBS up to 1GB to be fetched). This is much more efficient than the internal fetching of Oracle LOB locators that happens without the Python type handler, or happens in PHP OCI8. Locators are like pointers. With locators there is the need for extra round-trips between the app and the DB to get the actual LOB data. This is mentioned in the cx_Oracle doc and in the example return_lobs_as_strings.py.
Another difference is that cx_Oracle supports row array fetching (getting multiple rows each time an access to the DB is made), whereas PHP OCI8 only has row prefetching - which doesn't work with LOBs. Fetching each row in PHP requires a separate round trip to the database.
With PHP OCI8 3.2 from PECL for PHP 8.1, or with the bundled OCI8 in the PHP 8.2 development branch, you can enable LOB prefetching which can give a nice boost for PHP OCI8 when dealing with non-huge LOBS. This can remove round-trips to get data from each locator. However you can't overcome the lack of array fetching in PHP OCI8.
Some snippets to count round-trips in PHP are:
function getSid($c) {
$sql = "SELECT sys_context('userenv', 'sid') FROM dual";
$s = oci_parse($c, $sql);
oci_execute($s);
$arr = oci_fetch_array($s);
return($arr[0]);
}
function getRoundTrips($systemconn, $sid) {
$sql = <<<'ENDND'
SELECT ss.value
FROM v$sesstat ss, v$statname sn
WHERE ss.sid = :sid
AND ss.statistic# = sn.statistic#
AND sn.name LIKE '%roundtrip%client%'
ENDND;
$s = oci_parse($systemconn, $sql);
oci_bind_by_name($s, ":sid", $sid);
oci_execute($s);
$arr = oci_fetch_array($s);
return($arr[0]);
}
You can call getRoundTrips()
before and after running your application query. The difference between the two values is the number of round trips taken by your query. Note the use of a different, system connection to get the stats. These snippets can easily be rewritten in Python.
The implementation differences between cx_Oracle and PHP OCI8 are historic, driven by complexity. PHP OCI8 takes advantage of some convenience functionality in the Oracle Client libraries. This makes PHP OCI8 code simpler, but has a couple of drawbacks when it comes to LOBs.
Drivers like Python cx_Oracle, Node.js node-oracledb, Go godror etc that use Oracle's ODPI-C layer support tunable array fetching (getting multi-rows) and optional getting of LOBs via 'fetch as string/buffer'. They also support tunable row-prefetching, which is similar to array fetching (but handled by Oracle Client libraries). They don't do LOB data pre-fetching, because the 'fetch as string/buffer' removes the need for it.