Oracle Check Huge Insert Progress-CodePudding

Is there any possibility to monitor progress of huge insertion? Let say we have a code:

MERGE INTO huge_table ht
USING (
    SELECT column1, column2
    FROM another_huge_table 
) aht ON (ht.column1 = aht.column1)
WHEN NOT MATCHED THEN
INSERT (column1, column2)
VALUES (aht.column1, aht.column2);
COMMIT;

Tables can contain millions of rows and insertion process can remain hours. Is there any possibility to monitor DML progress in Oracle?

CodePudding user response：

Short answer: There is no database engine which can do that out-of-the-box. You need to understand that before starting the execution, Oracle does not know how many rows will be merged. An option, as stated in the comments, would be to split the query in chunks in PLSQL and use a logging method to monitor rows being committed vs time spent. But that will be much slower than running a normal merge statement, and I can think why it would be worth it.

Nevertheless, this approach can help you somehow. I am assuming your merge statement is not running in parallel for the sake of the purpose. Oracle offers the v$session_longops dictionary view to monitor long operations, defined as those that might take more than 6 seconds. However, for operation, you must understand each step within an execution plan.

Let's do a PoC

Tables

SQL> create table test.huge_table ( c1 number, c2 varchar2(40) , c3 varchar2(40) ) ;

Table created.

SQL> create table another_huge_table ( c1 number, c2 varchar2(40) , c3 varchar2(40) ) ;

Table created.

Let's insert 5M rows in the target table, and 10m in the source table, so that we can issue a merge for inserting 5M rows.

SQL> declare
  2  begin
  3  for i in 1 .. 5000000
  4  loop
  5     insert into test.huge_table
  6     values
  7     ( i ,
  8      lpad(dbms_random.string('A',1),round(dbms_random.value(20,30)),dbms_random.string('A',1)),
  9      lpad(dbms_random.string('A',1),round(dbms_random.value(20,30)),dbms_random.string('A',1))
 10     );
 11  end loop;
 12  commit;
 13* end;
SQL> /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats ( ownname => 'TEST' , tabname => 'HUGE_TABLE' , block_sample => true ) ;

PL/SQL procedure successfully completed.

SQL> declare
  2  begin
  3  for i in 1 .. 10000000
  4  loop
  5     insert into test.another_huge_table
  6     values
  7     ( i ,
  8      lpad(dbms_random.string('A',1),round(dbms_random.value(20,30)),dbms_random.string('A',1)),
  9      lpad(dbms_random.string('A',1),round(dbms_random.value(20,30)),dbms_random.string('A',1))
 10     );
 11     if i = 5000000
 12     then
 13             commit ;
 14     end if;
 15  end loop;
 16  commit;
 17* end;
SQL> /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats ( ownname => 'TEST' , tabname => 'ANOTHER_HUGE_TABLE' , block_sample => true ) ;

PL/SQL procedure successfully completed.

Let's verify the plan

SQL> explain plan for
merge into test.huge_table t
using (
    select c1,c2,c3 from test.another_huge_table
) s
ON (t.c1 = s.c1)
when not matched then
insert (c1, c2, c3)
values ( s.c1, s.c2, s.c3 );  2    3    4    5    6    7    8    9

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4265949913

------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name               | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT         |                    |    10M|  1096M|       | 67324   (1)| 00:00:06 |
|   1 |  MERGE                  | HUGE_TABLE         |       |       |       |            |          |
|   2 |   VIEW                  |                    |       |       |       |            |          |
|*  3 |    HASH JOIN RIGHT OUTER|                    |    10M|  1106M|   333M| 67324   (1)| 00:00:06 |
|   4 |     TABLE ACCESS FULL   | HUGE_TABLE         |  5000K|   276M|       |  8232   (1)| 00:00:01 |
|   5 |     TABLE ACCESS FULL   | ANOTHER_HUGE_TABLE |    10M|   553M|       | 16291   (1)| 00:00:02 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T"."C1"( )="C1")

We then run the merge

SQL> merge into test.huge_table t
using (
    select c1,c2,c3 from test.another_huge_table
) s
ON (t.c1 = s.c1)
when not matched then
insert (c1, c2, c3)
values ( s.c1, s.c2, s.c3 );  2    3    4    5    6    7    8

5000000 rows merged.

Elapsed: 00:00:36.39

SQL> commit ;

Commit complete.

Elapsed: 00:00:00.01

Meanwhile the process was running, I could get information for the steps

SQL> r
  1  select opname,
  2         target,
  3         round( ( sofar/totalwork ), 2 ) * 100 percentage_complete,
  4         start_time,
  5         ceil( time_remaining / 60 ) max_time_remaining_in_min,
  6         floor( elapsed_seconds / 60 ) time_spent_in_min
  7         from v$session_longops
  8*        where sofar != totalwork

OPNAME     TARGET          PERCENTAGE_COMPLETE START_TIM MAX_TIME_REMAINING_IN_MIN TIME_SPENT_IN_MIN
------------------- --------- ---------------------------------------------------
Table Scan TEST.ANOTHER_HUGE_TABLE          86 28-OCT-21                        1                 0


OPNAME     TARGET          PERCENTAGE_COMPLETE START_TIM MAX_TIME_REMAINING_IN_MIN TIME_SPENT_IN_MIN
------------------- --------- ---------------------------------------------------
Hash Join TEST.ANOTHER_HUGE_TABLE          42 28-OCT-21                        1                 0

OPNAME     TARGET          PERCENTAGE_COMPLETE START_TIM MAX_TIME_REMAINING_IN_MIN TIME_SPENT_IN_MIN
------------------- --------- ---------------------------------------------------
Hash Join TEST.ANOTHER_HUGE_TABLE          77 28-OCT-21                        1                 0

Summary

If Oracle thinks any of those steps will take more than 6 seconds, it will include them in the v$session_longops. You can query this viev to get an estimation per step, but it is not a real measure of how much time the whole DML will take. But it might be a good start.

Another approach, perhaps even better, would be to generate a baseline. You can include the merge in PLSQL, or in any other language for that matter, including a logging method to collect start time, end time and rows merged. Over time and dozens of executions, you can apply statistics and get average and/or standard deviation. I am using a similar approach to generate information of this type that I can use to make educated guesses, nothing more nothing less.

Hope this will help you somehow.