ORACLE 11G AND ORACLE 19C HAVE DIFFERENT ORDER WHEN SELECT DISTINCT FROM DATABASE-CodePudding

I am using DBeaver to query from two different Oracle version 19C & 11G. I'm currently using [ojdbc6-11.2.0.4]. When I use

SELECT DISTINCT a, b, c, d FROM <table_name>

It shows two different order in data. With Oracle 11G it was sorted meanwhile Oracle 19C didn't. Example:

Oracle 19C

a	b	c	d
1123	Something here	Something here	Something here
789	Something here	Something here	Something here
145	Something here	Something here	Something here
1249	Something here	Something here	Something here

Oracle 11G

a	b	c	d
145	Something here	Something here	Something here
789	Something here	Something here	Something here
1123	Something here	Something here	Something here
1249	Something here	Something here	Something here

I dont know how this phenomena happen? I read some document from other resources that said Oracle 11G and 19C use different algorithm for SELECT DISTINCT but I couldn't find any Oracle's Documents mention this. Please help me to find any Oracle Documentation that mention its.

CodePudding user response：

As others already stated, without ORDER BY it's impossible to expect the same results. But your query is using DISTINCT, in that situation there is in fact an implicit ORDER BY applied, but it's done on an HASH value internally calculated by ORACLE (... after all, ORACLE had to decide about the distinct or not...). What you are showing here, is that the internal hash is probably not using the same algorithm between the 2 ORACLE versions, or any other internal factor has modified the hash of rows between the 2 environments.

CodePudding user response：

This has nothing to do with Oracle really, but is defined in the SQL standard. From the sql1992.txt (https://datacadamia.com/data/type/relation/sql/ansi):

3.1.3 Definitions provided in this International Standard

multiset: An unordered collection of objects that are not necessarily distinct. The collection may be empty.

4.9 Tables

A table is a multiset of rows.

A table is either a base table, a viewed table, or a derived table.

A derived table is a table derived directly or indirectly from one or more other tables by the evaluation of a <query expression>.

20.2 <direct select statement: multiple rows>

General Rules

Let Q be the result of the <query expression>.
If an <order by clause> is not specified, then the ordering of the rows of Q is implementation-dependent.

The paragraph 20.2 alone would suffice to see that a query result is an unordered data set as long as no ORDER BY clause is used.

I've added the other paragraphs to show that actually all tables in a database are unordered data sets, and a query result is a table, hence unordered.

You can order your final query result with an ORDER BY clause. Without an ORDER BY you can get the rows in arbitrary order. This does not only mean you can get a different row order from one DBMS version to another, but even in the same DBMS version when running the same query twice.

To illustrate this even further: When you read from a database table, you get a result in arbitrary order. In below query we select all rows from the original table and order its rows. But then we use this query result (i.e. table) to read from it again, so once more we get the result rows in arbitrary order. The DBMS is free to ignore the ORDER BY clause altogether, as it doesn't influence the final result. Some DBMS even raise an error here, because they guess that you placed the ORDER BY by mistake and the query may hence not do what you actually want it to do.

select *
from
(
  select *
  from mytable
  order by id
) only_seemingly_ordered;