My understanding is that the estimated number of rows in an explain plan for the inner row source of a nested loop join reflects the number of rows for just one iteration of that nested loop.
In the following example, step 6 of the explain plan is an inner row source of a nested loop join that is obtaining a row by one ROWID at a time. Therefore, it ought to have an estimated rows of 1 (there is only 1 row per ROWID).
Why is step 6's table access by index ROWID
showing 100 (I expected it to show 1)?
Using Oracle 19c Enterprise Edition.
drop table "C";
drop table "P";
create table "P"
( p_id NUMBER
, p_num NUMBER
, p_pad CHAR(200 byte)
)
;
insert
into "P" (p_id, p_num, p_pad)
select level
, mod(level-1,200/2)
, ' '
from dual
connect by level <= 200
;
create table "C"
( c_id NUMBER
, p_id NUMBER
, c_pad CHAR(200 byte)
)
;
insert /* append enable_parallel_dml parallel (auto) */
into "C" (c_id, p_id, c_pad)
with
"D" as
( select /* materialize */ null from dual connect by level <= 100
)
select rownum c_id
, p_id p_id
, ' ' c_pad
from "P", "D"
;
commit;
create index IX_P on p (p_num);
create unique index IU_P on p (p_id);
alter table p add constraint UK_P unique (p_id) rely using index IU_P enable validate;
alter table C add constraint R_C foreign key (p_id) references p (p_id) rely enable validate;
create index IR_C on _C (p_id);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'P', cascade => true);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'C', cascade => true);
select /* optimizer_features_enable('19.1.0')
use_nl (P C) */
*
from "P"
join "C"
on P.p_id = C.p_id
and P.p_num = 1
;
plan hash value: 3840235794
----------------------------------------------------------------------------------------------
| id | Operation | name | rows | Bytes | cost (%CPU)| time |
----------------------------------------------------------------------------------------------
| 0 | select statement | | 200 | 83000 | 205 (0)| 00:00:01 |
| 1 | nested LOOPS | | 200 | 83000 | 205 (0)| 00:00:01 |
| 2 | nested LOOPS | | 200 | 83000 | 205 (0)| 00:00:01 |
| 3 | table access by index ROWID BATCHED| P | 2 | 414 | 3 (0)| 00:00:01 |
|* 4 | index range scan | IX_P | 2 | | 1 (0)| 00:00:01 |
|* 5 | index range scan | IR_C | 100 | | 1 (0)| 00:00:01 |
| 6 | table access by index ROWID | C | 100 | 20800 | 101 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("P"."P_NUM"=1)
5 - access("P"."P_ID"="C"."P_ID")
Outer row source step 3 multiplied by Inner row source step 5 = Nested Loop step 2.
However, Outer row source step 2 multiplied by Inner row source step 6 <> Nested Loop step 1.
I agree the total of step 1 ought to be 200, but don't understand why step 6 has an estimated rows of 100.
Why does step 6 have an estimated rows of 100 instead of 1?
Thanks in advance.
CodePudding user response:
Here you can see what row counts will be expected in the outer NESTED LOOP
select p_id, count(*) from C where p_id in (
select p_id from P where p_num = 1)
group by p_id;
P_ID COUNT(*)
---------- ----------
2 100
102 100
So realy each iteration expects to get 100 rows.
If you run the query with the hint gather_plan_statistics
you can see the number of Starts
and the total actual rows A-Rows
.
select /* gather_plan_statistics use_nl (P C) */
*
from "P"
join "C"
on P.p_id = C.p_id
and P.p_num = 1
SQL_ID 927pggk6scpwt, child number 0
-------------------------------------
select /* gather_plan_statistics use_nl (P C) */ * from "P"
join "C" on P.p_id = C.p_id and P.p_num = 1
Plan hash value: 2326820011
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 200 |00:00:00.01 | 213 |
| 1 | NESTED LOOPS | | 1 | 200 | 200 |00:00:00.01 | 213 |
| 2 | NESTED LOOPS | | 1 | 200 | 200 |00:00:00.01 | 13 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| P | 1 | 2 | 2 |00:00:00.01 | 5 |
|* 4 | INDEX RANGE SCAN | IX_P | 1 | 2 | 2 |00:00:00.01 | 3 |
|* 5 | INDEX RANGE SCAN | IR_C | 2 | 100 | 200 |00:00:00.01 | 8 |
| 6 | TABLE ACCESS BY INDEX ROWID | C | 200 | 100 | 200 |00:00:00.01 | 200 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("P"."P_NUM"=1)
5 - access("P"."P_ID"="C"."P_ID")
I.e. the operation 5 was started two times (column Starts
) with a total of 200 rows (column A-Rows
)
Operation 6 was started 200 times each time to get one row.
CodePudding user response:
I think this Oracle documentation paragraph explains this scenario fairly well:
Multiple nested loops operations can occasionally show up in the execution plan for just one join, which indicates that Oracle used the nested-loop batching optimization technique. What this method does is transform a single join of two row sources into a join of the driving row source to one copy of the probe row source that is joined to a replica of itself on ROWID; since we now have three row sources, we need at least two nested loops. The probe row source copy that is used to perform a self join on ROWID is used to filter rows, so it will have a corresponding
TABLE ACCESS BY ... ROWID
entry in the execution plan. This cost-based optimization can often reduce I/O although the execution plan may not transparently display the benefits.
Step 6 in your example is the "probe row source copy"; it's basically a cached version of table C, so it has 100 rows. But its cost is shared among all of the outer nested loops - the table was only accessed once - so it's already included in the total from step 2. (I think?)