Does indexing works with "WITH" in Oracle-CodePudding

I have query something like

WITH
str_table as (
  SELECT stringtext, stringnumberid
  FROM STRING_TABLE 
  WHERE LANGID IN (23,62) 
),
data as (
  select * 
  from employee emp
  left outer join str_table st on emp.nameid = st.stringnumberid
)
select * from data

I know With clause will work in this manner

Step 1 : The SQL Query within the with clause is executed at first step.

Step 2 : The output of the SQL query is stored into temporary relation of with clause.

Step 3 : The Main query is executed with temporary relation produced at the last stage.

Now I want to ask whether the indexes created on the actual STRING_TABLE are going to help in temporary str_table produce by the With clause? I want to ask whether the indexes also have impact on str_table or not?

CodePudding user response：

Oracle will not process CTE one by one. It will analyze the SQL as a whole. Your SQL is most likely the same as following in the eye of Oracle optimizer

select emp.* 
from employee emp left outer join STRING_TABLE st 
on emp.nameid = st.stringnumberid
where   st.LANGID IN (23,62);

Oracle can use index on STRING_TABLE. Whether it will depends on the table statistics. For example, if the table has few rows (say a few hundred), Oracle will likely not use index.

CodePudding user response：

It depends.

First of all, with clause is not a temporary table. As documentation says:

Oracle Database optimizes the query by treating the query name as either an inline view or as a temporary table.

Optimizer decides to materialize with subquery if either you forse it to do so by using /* materialize*/ hint inside the subquery or you reuse this with subquery more than once.

In the example below Oracle uses with clause as inline view and merges it within the main query:

explain plan for
with a as (
  select
    s.textid,
    s.textvalue,
    a.id,
    a.other_column
  from string_table s
    join another_tab a
      on s.textid = a.textid
  where langid in (1)
)
select *
from big_table b
  join a a_name
  on b.name_textid = a_name.textid
    and b.job_textid = a_name.id

| PLAN_TABLE_OUTPUT                                                                    |
| :----------------------------------------------------------------------------------- |
| Plan hash value: 1854049435                                                          |
|                                                                                      |
| ------------------------------------------------------------------------------------ |
| | Id  | Operation           | Name         | Rows  | Bytes | Cost (%CPU)| Time     | |
| ------------------------------------------------------------------------------------ |
| |   0 | SELECT STATEMENT    |              |     1 |  1147 |    74   (0)| 00:00:01 | |
| |*  1 |  HASH JOIN          |              |     1 |  1147 |    74   (0)| 00:00:01 | |
| |   2 |   TABLE ACCESS FULL | ANOTHER_TAB  |    39 |  3042 |     3   (0)| 00:00:01 | |
| |*  3 |   HASH JOIN         |              |    31 | 33139 |    71   (0)| 00:00:01 | |
| |   4 |    TABLE ACCESS FULL| BIG_TABLE    |    19 | 10279 |     3   (0)| 00:00:01 | |
| |*  5 |    TABLE ACCESS FULL| STRING_TABLE |  1143 |   589K|    68   (0)| 00:00:01 | |
| ------------------------------------------------------------------------------------ |

But depending on the statistics and hints it may evaluate subquery first and then add it to the main query:

explain plan for
with a as (
  select
    s.textid,
    s.textvalue,
    a.id,
    a.other_column
  from string_table s
    join another_tab a
      on s.textid = a.textid
  where langid in (1)
)
select /* NO_MERGE(a_name)*/ *
from big_table b
  join a a_name
  on b.name_textid = a_name.textid
    and b.job_textid = a_name.id

| PLAN_TABLE_OUTPUT                                                                     |
| :------------------------------------------------------------------------------------ |
| Plan hash value: 4105667421                                                           |
|                                                                                       |
| ------------------------------------------------------------------------------------- |
| | Id  | Operation            | Name         | Rows  | Bytes | Cost (%CPU)| Time     | |
| ------------------------------------------------------------------------------------- |
| |   0 | SELECT STATEMENT     |              |   101 |   110K|    74   (0)| 00:00:01 | |
| |*  1 |  HASH JOIN           |              |   101 |   110K|    74   (0)| 00:00:01 | |
| |   2 |   TABLE ACCESS FULL  | BIG_TABLE    |    19 | 10279 |     3   (0)| 00:00:01 | |
| |   3 |   VIEW               |              |    64 | 37120 |    71   (0)| 00:00:01 | |
| |*  4 |    HASH JOIN         |              |    64 | 38784 |    71   (0)| 00:00:01 | |
| |   5 |     TABLE ACCESS FULL| ANOTHER_TAB  |    39 |  3042 |     3   (0)| 00:00:01 | |
| |*  6 |     TABLE ACCESS FULL| STRING_TABLE |  1143 |   589K|    68   (0)| 00:00:01 | |
| ------------------------------------------------------------------------------------- |

When you use with subquery twice, optimizer decides to materialize it:

explain plan for
with a as (
  select
    s.textid,
    s.textvalue
  from string_table s
  where langid in (1)
)
select *
from big_table b
  join a a_name
  on b.name_textid = a_name.textid
  join a a_job
  on b.job_textid = a_job.textid

| PLAN_TABLE_OUTPUT                                                                                                      |
| :--------------------------------------------------------------------------------------------------------------------- |
| Plan hash value: 1371454296                                                                                            |
|                                                                                                                        |
| ---------------------------------------------------------------------------------------------------------------------- |
| | Id  | Operation                                | Name                      | Rows  | Bytes | Cost (%CPU)| Time     | |
| ---------------------------------------------------------------------------------------------------------------------- |
| |   0 | SELECT STATEMENT                         |                           |    63 | 98973 |    67   (0)| 00:00:01 | |
| |   1 |  TEMP TABLE TRANSFORMATION               |                           |       |       |            |          | |
| |   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D7224_469C01 |       |       |            |          | |
| |   3 |    TABLE ACCESS BY INDEX ROWID BATCHED   | STRING_TABLE              |   999 |   515K|    22   (0)| 00:00:01 | |
| |*  4 |     INDEX RANGE SCAN                     | IX                        |   999 |       |     4   (0)| 00:00:01 | |
| |*  5 |   HASH JOIN                              |                           |    63 | 98973 |    45   (0)| 00:00:01 | |
| |*  6 |    HASH JOIN                             |                           |    35 | 36960 |    24   (0)| 00:00:01 | |
| |   7 |     TABLE ACCESS FULL                    | BIG_TABLE                 |    19 | 10279 |     3   (0)| 00:00:01 | |
| |   8 |     VIEW                                 |                           |   999 |   502K|    21   (0)| 00:00:01 | |
| |   9 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D7224_469C01 |   999 |   502K|    21   (0)| 00:00:01 | |
| |  10 |    VIEW                                  |                           |   999 |   502K|    21   (0)| 00:00:01 | |
| |  11 |     TABLE ACCESS FULL                    | SYS_TEMP_0FD9D7224_469C01 |   999 |   502K|    21   (0)| 00:00:01 | |
| ---------------------------------------------------------------------------------------------------------------------- |

So when there are some indexes on tables inside with subquery they may be used in all above cases: before materialization, when subquery is not merged and when subquery is merged and some idexes provide better query plan on merged subquery (even when those indexes are not used when you execute subquery alone).

What about idexes: if they provide high selectivity (i.e. number of rows retrieved by index is small compared to the overall number of rows), then Oracle will consider to use it. Note, that index access has two steps: read index blocks and then read table blocks that contain rowids found by index. If table size is not much bigger than index size, then Oracle may use table scan instead of index scan even for quite selective predicate (because of doubled IO).

In the below example I've used "small" texts (100 chars) and big_table table of 20 rows and this index for text table:

create index ix
on string_table(langid, textid)

Optimizer decides to use index range scan and read only blocks of the first level (first column of the index):

explain plan for
with a as (
  select
    s.textid,
    s.textvalue
  from string_table s
  where langid in (1)
)
select *
from big_table b
  join a a_name
  on b.name_textid = a_name.textid

| PLAN_TABLE_OUTPUT                                                                                     |
| :---------------------------------------------------------------------------------------------------- |
| Plan hash value: 1660330381                                                                           |
|                                                                                                       |
| ----------------------------------------------------------------------------------------------------- |
| | Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     | |
| ----------------------------------------------------------------------------------------------------- |
| |   0 | SELECT STATEMENT                     |              |    29 | 31001 |    26   (0)| 00:00:01 | |
| |*  1 |  HASH JOIN                           |              |    29 | 31001 |    26   (0)| 00:00:01 | |
| |   2 |   TABLE ACCESS FULL                  | BIG_TABLE    |    19 | 10279 |     3   (0)| 00:00:01 | |
| |   3 |   TABLE ACCESS BY INDEX ROWID BATCHED| STRING_TABLE |   999 |   515K|    23   (0)| 00:00:01 | |
| |*  4 |    INDEX RANGE SCAN                  | IX           |   999 |       |     4   (0)| 00:00:01 | |
| ----------------------------------------------------------------------------------------------------- |
|                                                                                                       |
| Predicate Information (identified by operation id):                                                   |
| ---------------------------------------------------                                                   |
|                                                                                                       |
|    1 - access("B"."NAME_TEXTID"="S"."TEXTID")                                                         |
|    4 - access("LANGID"=1)                                                                             |                                                                      |

But when we reduce the number of rows in big_table, it uses both the columns for index scan:

delete from big_table
where id > 4

explain plan for
with a as (
  select
    s.textid,
    s.textvalue
  from string_table s
  where langid in (1)
)
select *
from big_table b
  join a a_name
  on b.name_textid = a_name.textid

| PLAN_TABLE_OUTPUT                                                                             |
| :-------------------------------------------------------------------------------------------- |
| Plan hash value: 1766926914                                                                   |
|                                                                                               |
| --------------------------------------------------------------------------------------------- |
| | Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     | |
| --------------------------------------------------------------------------------------------- |
| |   0 | SELECT STATEMENT             |              |     6 | 18216 |    11   (0)| 00:00:01 | |
| |   1 |  NESTED LOOPS                |              |     6 | 18216 |    11   (0)| 00:00:01 | |
| |   2 |   NESTED LOOPS               |              |     6 | 18216 |    11   (0)| 00:00:01 | |
| |   3 |    TABLE ACCESS FULL         | BIG_TABLE    |     4 |  4032 |     3   (0)| 00:00:01 | |
| |*  4 |    INDEX RANGE SCAN          | IX           |     1 |       |     1   (0)| 00:00:01 | |
| |   5 |   TABLE ACCESS BY INDEX ROWID| STRING_TABLE |     2 |  4056 |     2   (0)| 00:00:01 | |
| --------------------------------------------------------------------------------------------- |
|                                                                                               |
| Predicate Information (identified by operation id):                                           |
| ---------------------------------------------------                                           |
|                                                                                               |
|    4 - access("LANGID"=1 AND "B"."NAME_TEXTID"="S"."TEXTID")                                  |
|                                                                                               |

You may check above code snippets in the db<>fiddle.