Home > database >  Oracle: Inner Row Source of Nested Loop - Incorrect Estimated Rows?
Oracle: Inner Row Source of Nested Loop - Incorrect Estimated Rows?

Time:11-11

My understanding is that the estimated number of rows in an explain plan for the inner row source of a nested loop join reflects the number of rows for just one iteration of that nested loop.

In the following example, step 6 of the explain plan is an inner row source of a nested loop join that is obtaining a row by one ROWID at a time. Therefore, it ought to have an estimated rows of 1 (there is only 1 row per ROWID).

Why is step 6's table access by index ROWID showing 100 (I expected it to show 1)?

Using Oracle 19c Enterprise Edition.

drop table "C";
drop table "P";

create table "P"
  ( p_id  NUMBER
  , p_num NUMBER
  , p_pad CHAR(200 byte)
  )
;

insert
  into "P" (p_id, p_num, p_pad)
  select level
       , mod(level-1,200/2)
       , ' '
    from dual
    connect by level <= 200
;

create table "C"
  ( c_id  NUMBER
  , p_id  NUMBER
  , c_pad CHAR(200 byte)
  )
;

insert /*  append enable_parallel_dml parallel (auto) */
  into "C" (c_id, p_id, c_pad)
  with
    "D" as
    ( select /*  materialize */ null from dual connect by level <= 100
    )
    select rownum      c_id
         , p_id        p_id
         , ' '         c_pad
      from "P", "D"
;
commit;
create index IX_P on p (p_num);   
create unique index IU_P on p (p_id);
alter table p add constraint UK_P unique (p_id) rely using index IU_P enable validate;
alter table C add constraint R_C foreign key (p_id) references p (p_id) rely enable validate;
create index IR_C on _C (p_id);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'P', cascade => true);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'C', cascade => true);
select /*  optimizer_features_enable('19.1.0')
           use_nl (P C) */
       *
  from "P"
       join "C"
         on P.p_id = C.p_id
            and P.p_num = 1
;
plan hash value: 3840235794

----------------------------------------------------------------------------------------------
| id  | Operation                             | name | rows  | Bytes | cost (%CPU)| time     |
----------------------------------------------------------------------------------------------
|   0 | select statement                      |      |   200 | 83000 |   205   (0)| 00:00:01 |
|   1 |  nested LOOPS                         |      |   200 | 83000 |   205   (0)| 00:00:01 |
|   2 |   nested LOOPS                        |      |   200 | 83000 |   205   (0)| 00:00:01 |
|   3 |    table access by index ROWID BATCHED| P    |     2 |   414 |     3   (0)| 00:00:01 |
|*  4 |     index range scan                  | IX_P |     2 |       |     1   (0)| 00:00:01 |
|*  5 |    index range scan                   | IR_C |   100 |       |     1   (0)| 00:00:01 |
|   6 |   table access by index ROWID         | C    |   100 | 20800 |   101   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("P"."P_NUM"=1)
   5 - access("P"."P_ID"="C"."P_ID")

Outer row source step 3 multiplied by Inner row source step 5 = Nested Loop step 2.

However, Outer row source step 2 multiplied by Inner row source step 6 <> Nested Loop step 1.

I agree the total of step 1 ought to be 200, but don't understand why step 6 has an estimated rows of 100.

Why does step 6 have an estimated rows of 100 instead of 1?

Thanks in advance.

CodePudding user response:

Here you can see what row counts will be expected in the outer NESTED LOOP

select p_id, count(*) from C where p_id in (
select p_id from P where p_num = 1)
group by p_id;

      P_ID   COUNT(*)
---------- ----------
         2        100
       102        100

So realy each iteration expects to get 100 rows.

If you run the query with the hint gather_plan_statistics you can see the number of Starts and the total actual rows A-Rows.

select /*  gather_plan_statistics use_nl (P C) */
       *
  from "P"
       join "C"
         on P.p_id = C.p_id
            and P.p_num = 1

SQL_ID  927pggk6scpwt, child number 0
-------------------------------------
select /*  gather_plan_statistics use_nl (P C) */        *   from "P"   
     join "C"          on P.p_id = C.p_id             and P.p_num = 1
 
Plan hash value: 2326820011
 
--------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |      |      1 |        |    200 |00:00:00.01 |     213 |
|   1 |  NESTED LOOPS                         |      |      1 |    200 |    200 |00:00:00.01 |     213 |
|   2 |   NESTED LOOPS                        |      |      1 |    200 |    200 |00:00:00.01 |      13 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| P    |      1 |      2 |      2 |00:00:00.01 |       5 |
|*  4 |     INDEX RANGE SCAN                  | IX_P |      1 |      2 |      2 |00:00:00.01 |       3 |
|*  5 |    INDEX RANGE SCAN                   | IR_C |      2 |    100 |    200 |00:00:00.01 |       8 |
|   6 |   TABLE ACCESS BY INDEX ROWID         | C    |    200 |    100 |    200 |00:00:00.01 |     200 |
--------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("P"."P_NUM"=1)
   5 - access("P"."P_ID"="C"."P_ID")

I.e. the operation 5 was started two times (column Starts) with a total of 200 rows (column A-Rows)

Operation 6 was started 200 times each time to get one row.

CodePudding user response:

I think this Oracle documentation paragraph explains this scenario fairly well:

Multiple nested loops operations can occasionally show up in the execution plan for just one join, which indicates that Oracle used the nested-loop batching optimization technique. What this method does is transform a single join of two row sources into a join of the driving row source to one copy of the probe row source that is joined to a replica of itself on ROWID; since we now have three row sources, we need at least two nested loops. The probe row source copy that is used to perform a self join on ROWID is used to filter rows, so it will have a corresponding TABLE ACCESS BY ... ROWID entry in the execution plan. This cost-based optimization can often reduce I/O although the execution plan may not transparently display the benefits.

Step 6 in your example is the "probe row source copy"; it's basically a cached version of table C, so it has 100 rows. But its cost is shared among all of the outer nested loops - the table was only accessed once - so it's already included in the total from step 2. (I think?)

  • Related