How to run Oracle PL/SQL in python-CodePudding

I'm using Jupyter notebook to run a PL/SQL script but I get an error. The code block in the notebook is as following:

%%sql

DECLARE BEGIN
    FOR record_item IN (
        SELECT
            *
        FROM
            duplicated_records
    ) LOOP
        EXECUTE IMMEDIATE 'UPDATE table_name SET record_id ='|| record_item.original_record_id || ' WHERE record_id =' || record_item.duplicated_record_id;
        EXECUTE IMMEDIATE 'DELETE FROM records WHERE id ='|| record_item.duplicated_record_id;
    END LOOP;
END

The error is

(cx_Oracle.DatabaseError) ORA-06550: line 8, column 165:
PLS-00103: Encountered the symbol "end-of-file" when expecting one of the following:

Non PL/SQL code seems to work such as select, update statements etc.

It works perfectly fine from other sql clients like SQL developer. I've tried adding/removing the ; at the end but it still doesn't work.

CodePudding user response：

I don't know Python so I can't assist about that, but - as far as Oracle is concerned - you don't need DECLARE (as you didn't declare anything), and you certainly don't need dynamic SQL (EXECUTE IMMEDIATE) as there's nothing dynamic there.

Rewritten:

BEGIN
  FOR record_item IN (SELECT * FROM duplicated_records) LOOP
    UPDATE table_name
        SET   record_id = record_item.original_record_id
        WHERE record_id = record_item.duplicated_record_id;

    DELETE FROM records
        WHERE        id = record_item.duplicated_record_id;
  END LOOP;
END;

On the other hand, row-by-row processing is slow-by-slow. Consider using two separate statements: one which will update existing rows, and another which will delete rows (from a different table, apparently):

merge into table_name a
  using duplicated_records b
  on (a.record_id = b.duplicate_record_id) 
  when matched then update set
    a.record_id = b.original_record_id;

delete from records a
  where a.id in (select b.duplicated_record_id from duplicated_records b);

If tables are properly indexed (on ID columns), that should behave better (faster).

CodePudding user response：

The direct implementation of your code in Python would be like:

import oracledb
import traceback
import os
import sys

#if sys.platform.startswith('darwin'):
#    oracledb.init_oracle_client(lib_dir=os.environ.get('HOME') '/Downloads/instantclient_19_8')

un = os.environ.get('PYTHON_USERNAME')
pw = os.environ.get('PYTHON_PASSWORD')
cs = os.environ.get('PYTHON_CONNECTSTRING')

try:
    connection = oracledb.connect(user=un, password=pw, dsn=cs)

    with connection.cursor() as cursor:
        plsql = """BEGIN
                       FOR RECORD_ITEM IN (
                           SELECT
                               *
                           FROM
                               DUPLICATED_RECORDS
                       ) LOOP
                           EXECUTE IMMEDIATE 'UPDATE table_name SET record_id ='
                               || RECORD_ITEM.ORIGINAL_RECORD_ID
                               || ' WHERE record_id ='
                               || RECORD_ITEM.DUPLICATED_RECORD_ID;
                           EXECUTE IMMEDIATE 'DELETE FROM records WHERE id ='
                               || RECORD_ITEM.DUPLICATED_RECORD_ID;
                       END LOOP;
                   END;"""

        cursor.execute(plsql)

except oracledb.Error as e:
    error, = e.args
    traceback.print_tb(e.__traceback__)
    print(error.message)

For this you need to install the oracledb module (which is the renamed, latest version of the cx_Oracle module. It will work with cx_Oracle too with some renaming).

However, before blindly copying this, check @littlefoot's answer for more about the PL/SQL code.