Home > database >  Oracle SQL script parallel execution
Oracle SQL script parallel execution

Time:01-27

For my job I need to prepare two tables (CTAS) and then do some joins between them. For this job I created a script (run it in SQL Developer) which consequentially creates these two tables one after another. Since these two tables are not related I'd like to start creating them in parallel. Is it possible in SQL script to start two table creations (or two other scripts) in parallel and then proceed when both finish their jobs?

CodePudding user response:

Here's one option.

I wouldn't really CTAS - I'd rather create both tables in advance, and then insert rows into them. Why? Because this approach uses stored procedures which - in order to perform DDL (which is CTAS) - require dynamic SQL. Not that it is impossible to do that; on the contrary, but it is way simpler NOT to use it.

I'd create yet another table (let's call it table_done) which contains only one row with two columns: table_1 and table_2 whose values can be 0 (meaning: data for that table is not ready) or 1 (data ready).

Furthermore, I'd create two stored procedures which look the same; the only difference is that each of them inserts rows into its own table:

create procedure p_insert_1 as
begin
  -- remove old data
  execute immediate 'truncate table table_1';

  -- table_1 data not ready
  update table_done set table_1 = 0;

  -- prepare new data
  insert into table_1 (...) select ...;

  -- table_1 data ready
  insert into table_done (table_1) values (1);
  commit;
end;

The 3rd, "main" procedure, is the one you'd run manually. What would it do? Create two one-time database jobs that run immediately, each of them starting its own p_insert procedure so that they run in parallel. That procedure would then (in a loop) check whether both columns in table_done are set to 1 and - if so - continue execution.

create procedure p_main is
  l_job_1 number;
  l_job_2 number;
  --
  l_t1_done number;
  l_t2_done number;
begin
  dbms_job.submit(l_job_1, 'begin p_insert_1; end;');
  dbms_job.submit(l_job_2, 'begin p_insert_2; end;');

  loop
    select table_1, table_2
      into l_t1_done, l_t2_done
      from table_done;

    if l_t1_done = 1 and l_t2_done = 1 then
       -- exit the loop
       exit;
    else 
       -- tables aren't ready yet; wait 60 seconds and try again
       dbms_lock.sleep(60);
    end if;
  end loop;

  -- process data prepared in table_1 and table_2
end;

That's just a simplified idea; I didn't test it myself so I apologize if there are any errors I made. Also,

  • instead of dbms_job, you could choose to use advanced dbms_scheduler
  • if you're on 18c (or later), use dbms_session.sleep instead of dbms_lock.sleep
  • and so forth

CodePudding user response:

Use SQL parallelism instead of process concurrency. While the words parallelism and concurrency are colloquially interchangeable, in Oracle they have different meanings. Parallelism implies that the SQL engine handles all the coordination of breaking work into little pieces, running those pieces at the same time, and then re-assembling them. Concurrency implies that the user will create multiple sessions and handle the coordination manually.

For simply creating two tables, parallelism will probably be simpler and faster than concurrency. For parallelism, you may only need to add a hint like this:

INSERT /*  APPEND PARALLEL(2) ENABLE_PARALLEL_DML */ ...

The preceding hints may appear strange at first but if you're going to run large Oracle batch jobs, they're worth learning.

  • APPEND says to use direct-path writes. Direct-path writes are much faster than normal writes because they don't generate transactional data used for recovery. But since the process is already truncating the tables, you probably don't care about recovering those two table when you can easily re-load the data.
  • PARALLEL(2) says to run two server processes at the same time while the SQL statement is running. You can easily increase that number, but it's easy to "steal" resources from other sessions.
  • ENABLE_PARALLEL_DML tells Oracle to run the writing portion of the SQL in parallel. Writing the data in parallel, instead of just reading the source data in parallel, will put an extra lock on the whole table until the process is done. However, since you're truncating the table anyway, that is an acceptable risk. The common theme with these hints is that we have to constantly tell Oracle, "I don't care about the repercussions; I can reload data later if something breaks, and I need this process to run as fast as possible."

DBMS_SCHEDULER and other concurrency mechanisms are powerful and useful, but I recommend avoiding them if possible. Running and monitoring scheduler jobs will likely be much more complicated than the 45 character hint string previously discussed. (Although you may still need to occasionally monitor the parallel SQL statement using a tool like OEM SQL Monitor Reports to ensure that the server is actually using the requested parallelism.)

  • Related