I am using DBeaver to query from two different Oracle version 19C & 11G. I'm currently using [ojdbc6-11.2.0.4]. When I use
SELECT DISTINCT a, b, c, d FROM <table_name>
It shows two different order in data. With Oracle 11G it was sorted meanwhile Oracle 19C didn't. Example:
Oracle 19C
a | b | c | d |
---|---|---|---|
1123 | Something here | Something here | Something here |
789 | Something here | Something here | Something here |
145 | Something here | Something here | Something here |
1249 | Something here | Something here | Something here |
Oracle 11G
a | b | c | d |
---|---|---|---|
145 | Something here | Something here | Something here |
789 | Something here | Something here | Something here |
1123 | Something here | Something here | Something here |
1249 | Something here | Something here | Something here |
I dont know how this phenomena happen? I read some document from other resources that said Oracle 11G and 19C use different algorithm for SELECT DISTINCT but I couldn't find any Oracle's Documents mention this. Please help me to find any Oracle Documentation that mention its.
CodePudding user response:
As others already stated, without ORDER BY it's impossible to expect the same results. But your query is using DISTINCT, in that situation there is in fact an implicit ORDER BY applied, but it's done on an HASH value internally calculated by ORACLE (... after all, ORACLE had to decide about the distinct or not...). What you are showing here, is that the internal hash is probably not using the same algorithm between the 2 ORACLE versions, or any other internal factor has modified the hash of rows between the 2 environments.
CodePudding user response:
This has nothing to do with Oracle really, but is defined in the SQL standard. From the sql1992.txt (https://datacadamia.com/data/type/relation/sql/ansi):
3.1.3 Definitions provided in this International Standard
multiset: An unordered collection of objects that are not necessarily distinct. The collection may be empty.
4.9 Tables
A table is a multiset of rows.
A table is either a base table, a viewed table, or a derived table.
A derived table is a table derived directly or indirectly from one or more other tables by the evaluation of a <query expression>.
20.2 <direct select statement: multiple rows>
General Rules
Let Q be the result of the <query expression>.
If an <order by clause> is not specified, then the ordering of the rows of Q is implementation-dependent.
The paragraph 20.2 alone would suffice to see that a query result is an unordered data set as long as no ORDER BY
clause is used.
I've added the other paragraphs to show that actually all tables in a database are unordered data sets, and a query result is a table, hence unordered.
You can order your final query result with an ORDER BY
clause. Without an ORDER BY
you can get the rows in arbitrary order. This does not only mean you can get a different row order from one DBMS version to another, but even in the same DBMS version when running the same query twice.
To illustrate this even further: When you read from a database table, you get a result in arbitrary order. In below query we select all rows from the original table and order its rows. But then we use this query result (i.e. table) to read from it again, so once more we get the result rows in arbitrary order. The DBMS is free to ignore the ORDER BY
clause altogether, as it doesn't influence the final result. Some DBMS even raise an error here, because they guess that you placed the ORDER BY
by mistake and the query may hence not do what you actually want it to do.
select *
from
(
select *
from mytable
order by id
) only_seemingly_ordered;