What is ORacle SQL's Method for Obtaining Distincts-CodePudding

I am trying to better understand how Oracle SQL determines distincts when executing code. I would expect that it works on one column at a time and then any duplicates are then set aside and the next column is compared only on those rows. This would continue by verifying the next column has/does not have any duplicates and only the rows that continue to have duplicates are put through the next step each time. Is this accurate? Additionally, is there a method that it uses to determine on which column to start? Some columns are very likely to have duplicates while others might be keys that should not contain any (assuming duplication didn't occur on a join or union).

CodePudding user response：

There are two basic methods:

Building a hash table of the unique values
Sorting the data & de-duplicating in the process

Hashing tends to be more efficient.

Sorting is preferred in situations including:

The query has an order by - the data has to be sorted anyway, so there's no point hashing and sorting
There's an index on the distinct expression - the database can walk through the index to find the unique values. If it's a unique index the de-duplication step can be skipped too as the values are guaranteed to be unique.

The optimizer may consider other factors too (such as the expected number of distinct value, number of nulls, total data set size, etc.). The exact details of when the optimizer chooses a sort over hashing can change from version to version.

Here are some examples:

-- distinct a non-indexed column => hash
select distinct salary from hr.employees;

select * 
from   table ( dbms_xplan.display_cursor ( format => 'BASIC' ) );
/*
----------------------------------------
| Id  | Operation          | Name      |
----------------------------------------
|   0 | SELECT STATEMENT   |           |
|   1 |  HASH UNIQUE       |           |
|   2 |   TABLE ACCESS FULL| EMPLOYEES |
----------------------------------------
*/

-- order by & distinct => sort
select distinct salary from hr.employees
order  by salary;

select * 
from   table ( dbms_xplan.display_cursor ( format => 'BASIC' ) );
/*
----------------------------------------
| Id  | Operation          | Name      |
----------------------------------------
|   0 | SELECT STATEMENT   |           |
|   1 |  SORT UNIQUE       |           |
|   2 |   TABLE ACCESS FULL| EMPLOYEES |
----------------------------------------
*/

-- index on distinct column => sort
select distinct department_id from hr.employees;

select * 
from   table ( dbms_xplan.display_cursor ( format => 'BASIC' ) );
/*
------------------------------------------------
| Id  | Operation          | Name              |
------------------------------------------------
|   0 | SELECT STATEMENT   |                   |
|   1 |  SORT UNIQUE NOSORT|                   |
|   2 |   INDEX FULL SCAN  | EMP_DEPARTMENT_IX |
------------------------------------------------
*/

-- distinct of unique/primary index => no sort/hash needed! Just read the index
select distinct employee_id from hr.employees;

select * 
from   table ( dbms_xplan.display_cursor ( format => 'BASIC' ) );
/*
------------------------------------------
| Id  | Operation        | Name          |
------------------------------------------
|   0 | SELECT STATEMENT |               |
|   1 |  INDEX FULL SCAN | EMP_EMP_ID_PK |
------------------------------------------
*/