Home > Enterprise >  How to delete records in parallel from oracle table
How to delete records in parallel from oracle table

Time:04-20

We are maintaining audit of our application in table 'application_audit'.
I am trying to write the stored procedure to delete records from this table which we don't need any more.
So far, I have written below stored procedure but I found that it is taking lot of time when number of rows to be deleted are more than 100k.

Can you please help me to implement parallel sessions OR optimize delete query in below stored procedure to speedup the execution.

In production, this table will have at least 5 million rows at any given point of time and from what I can see, if we execute this stored proc everyday then there will be at least 100k records to be deleted.

In below query, COMPONENT_NAME='REQUESTPURGE' means that for that particular request number purge already happened and there is no request data present in our active database instance for that request number so all records in 'application_audit' table with that request number become eligible for deletion.

Stored procedure:

create or replace PROCEDURE APPLICATION_AUDIT_PURGE_RECORD
IS
purgewait number := 30;
BEGIN
  DBMS_OUTPUT.PUT_LINE('Application audit purge started with purge wait value as   '||purgewait||' days');
  delete from application_audit where id in (select id from application_audit where request_number in (select request_number from application_audit where COMPONENT_NAME='REQUESTPURGE' and trunc(timestamp) < trunc(sysdate - purgewait)));

END APPLICATION_AUDIT_PURGE_RECORD;

Table:

  CREATE TABLE "APPLICATION_AUDIT" (    
    "ID" NUMBER GENERATED ALWAYS AS IDENTITY NOT NULL, 
    "MESSAGE_TYPE" VARCHAR2(64 CHAR), 
    "COMPONENT_NAME" VARCHAR2(64 CHAR), 
    "USERNAME" VARCHAR2(32 CHAR), 
    "TIMESTAMP" TIMESTAMP (6) WITH TIME ZONE NOT NULL, 
    "REQUEST_NUMBER" VARCHAR2(64 CHAR), 
    "MODULE_NAME" VARCHAR2(256 CHAR), 
    "PROCESS_NAME" VARCHAR2(256 CHAR), 
    "VERSION" VARCHAR2(64 CHAR), 
    "TASK" VARCHAR2(64 CHAR), 
    "ERROR_CODE" VARCHAR2(256 CHAR), 
    "ERROR_MESSAGE" VARCHAR2(4000 CHAR), 
    "MESSAGE" VARCHAR2(4000 CHAR)
   )

CodePudding user response:

Deleting 5 million records shouldn't be that time consuming.

Having said that, you can try adding a parallel hint to the DELETE statement.

First enable


ALTER SESSION ENABLE PARALLEL DML;

If that's not helping, you could look into:

Disabling indexes on the table

But, of course, any queries needing and using those indexes will be slower while your delete runs. So you're just trading one slow statement for (lots of) others. And you'll have to rebuild them afterwards which will take (possibly a looooooong) time.

You can look into chunking by SQL or rowid

If none of these help enough, you may need to look into more radical solutions.

Such as saving the data you want to keep in a temporary table. Then dropping the current table and renaming the temporary one. e.g.:

create table tmp as select ...data you want to keep... from old_tab;
drop old_tab;
rename tmp to old_tab;
-- run grants, indexes etc. that were on the original table
...

But you need an outage to do this.

I would suggest track down where the bottleneck is occurring first with an explain plan or trace as it sounds like you have an underlying problem if 5 million deletes are taking a long time

CodePudding user response:

I think your DELETE query can be oversimplified to -

DELETE FROM application_audit 
 WHERE COMPONENT_NAME = 'REQUESTPURGE' 
   AND TRUNC(timestamp) < TRUNC(SYSDATE - purgewait);

You can try having an index on COMPONENT_NAME column as well.

CodePudding user response:

I see it suffices to find one row with component_name = 'REQUESTPURGE' to delete all rows with the same request number. This means that the component_name alone doesn't tell us whether to delete a row or not. Otherwise I'd have uggested to use table partitions here.

As is, all I can think of is providing appropriate indexes. First of all, though, your query can be simplified to:

delete from application_audit 
where request_number in 
(
  select request_number 
  from application_audit 
  where component_name = 'REQUESTPURGE' 
  and timestamp < trunc(sysdate - purgewait)
);

The indexes I suggest for this statement:

create index idx1 on application_audit (component_name, timestamp, request_number);
create index idx2 on application_audit (request_number);
  • Related