Home > Back-end >  Union-all postgresql select clauses preserving order
Union-all postgresql select clauses preserving order

Time:04-27

Having complex SQL query to RDBMS Postgresql which consists of multiple nested UNION ALL-like nested queries, something like this:

(
  (
     (<QUERY 1-1-1> UNION ALL <QUERY 1-1-2>) UNION ALL
     (<QUERY 1-1-3> UNION ALL <QUERY 1-1-4>) UNION ALL
     ...
  ) UNION ALL
  (
     (<QUERY 1-2-1> UNION ALL <QUERY 1-2-2>) UNION ALL
     (<QUERY 1-2-3> UNION ALL <QUERY 1-2-4>) UNION ALL
     ...
  ) UNION ALL
  ...
) UNION ALL
(
  (
     (<QUERY 2-1-1> UNION ALL <QUERY 2-1-2>) UNION ALL
     (<QUERY 2-1-3> UNION ALL <QUERY 2-1-4>) UNION ALL
     ...
  ) UNION ALL
  (
     (<QUERY 2-2-1> UNION ALL <QUERY 2-2-2>) UNION ALL
     (<QUERY 2-2-3> UNION ALL <QUERY 2-2-4>) UNION ALL
     ...
  ) UNION ALL
  ...
) UNION ALL
(
  ...
)

Each <QUERY i-th> is relatively lightweight query which produces about 100K-1M rows and can be sorted in-memory without significant performance impact.

Result query is consists of tens thousands multi-level nested UNION ALL queries in strict conventional order, like traversing tree in depth, so result query is several billion rows dataset.

So question is: since SQL does not guarantee order of UNION ALL statement, outer query should contain ORDER BY clause, but server hardware cannot perform sorting of billon rows in required time.

However, order of united queries is strict determined, and should be: <QUERY 1-1-1>, <QUERY 1-1-2> and so on, sorted hierarchically, so in fact sorting of outer query is redundant, since dataset is already sorted by sql query structure.

It's necessary to force Postgres to preserve order of nested UNION ALL statements. How to do it? Any plugins, extensions and even dirty hacks are welcome.

Please avoid of answers and comments mention XY-like problem - question is formulated as-is in research manner. Structure of database and dataset cannot be changed by conditions of question. Thanks.

CodePudding user response:

There are 2 ways of looking at this:

  • The safest alternative is be to declare an id column using SERIAL or BIGSERIAL, which will be ordered and indexed. As the records are already ordered there will be a minimal effect on query speed and you will be sure that there are no errors in the ordering.

  • If the order is not critical, and you don't modify the data at all it will probably be fetched in the same order as you entered it. There is no guarantee. How important is the order to your application?

CodePudding user response:

Try this - allocate the queries' results into a temporary table. Here it is step by step:

  • Create a temporary table ex. the_temp_table like the the record type of <QUERY 1-1-1>
create temporary table the_temp_table as <QUERY 1-1-1> limit 0;
  • Add an auto-increment primary key column extra_id to the_temp_table
alter table the_temp_table add column extra_id serial primary key not null;
  • Then run all your queries one by one in the right order
insert into the_temp_table <QUERY 1-1-1>; insert into the_temp_table <QUERY 1-1-2>;
insert into the_temp_table <QUERY 1-1-3>; insert into the_temp_table <QUERY 1-1-4>;

insert into the_temp_table <QUERY 1-2-1>; insert into the_temp_table <QUERY 1-2-2>;
insert into the_temp_table <QUERY 1-2-3>; insert into the_temp_table <QUERY 1-2-4>;

-- continue 
  • Finally
select <fields list w/o extra_id> from the_temp_table order by extra_id;

-- no sorting is taking place here

Effectively thus you will be emulating UNION ALL in a controlled manner with an insignificant performance penalty.

  • Related