I'm constructing a CTE called sorted
that will be sorted by 3 columns, all in ascending order:
- first
sort_1
(a non-null integer), - then
sort_2
(a boolean with nulls last), - then
sort_3
(a non-null timestamp)
Here's a sample of what the results of the CTE look like. Notice in addition to the sorted-on columns, there is an id
column, and then two non-null boolean columns to represent some arbitrary condition.
My main query will query those columns after first constructing the CTE.
id | sort_1 | sort_2 | sort_3 | condition_1 | condition_2
---- -------- -------- ------------------------ ------------- -------------
6 | 1 | | 2022-12-06 00:00:00 00 | f | t
4 | 2 | f | 2022-12-04 00:00:00 00 | t | f
5 | 2 | | 2022-12-05 00:00:00 00 | f | f
... Lots of other rows ...
1 | 997 | t | 2022-12-01 00:00:00 00 | f | f
3 | 997 | t | 2022-12-03 00:00:00 00 | f | t
2 | 998 | f | 2022-12-02 00:00:00 00 | t | t
Given the following query:
WITH sorted AS (
-- Create CTE Here ---
-- ...
ORDER BY sort_1, sort_2 NULLS LAST, sort_3
)
(SELECT id FROM sorted WHERE condition_1 = 't' LIMIT 1)
UNION
(SELECT id FROM sorted WHERE condition_2 = 't' LIMIT 1);
Am I always guaranteed to get ids 4
and 6
back (the rows that fulfill those conditions and appear first in the sorted CTE intermediate result), no matter how many or few rows are returned from the original sorted CTE, or is there any chance other rows that fulfill those conditions could be returned instead? (For example, condition_1 = 't'
is also true for the row with an id
of 2
)
Put another way: will Postgres scan from the top of the table or will it scan the table in a different way? Do I have any way of knowing how it will perform the scan? Do I need to duplicate the ORDER BY
clause in each subsequent query after the CTE if I want the order to be guaranteed for every result?
CodePudding user response:
You ask for one matching column in no particular order. SQL is a declarative language. It will give you what you ask for. There is no inherent order. The order cannot be relied on unless you specify an order.
Due to how the query is implemented you'll probably get the first one returned by the cte. PostgreSQL will probably produce the same query plan for both with z ( select * from x order by y) select * from z limit 1
and select * from x order by y limit 1
, but there is no guarantee.
From the PostgreSQL LIMIT docs...
...SQL does not promise to deliver the results of a query in any particular order unless ORDER BY is used to constrain the order.
It is even possible for repeated executions of the same LIMIT query to return different subsets of the rows of a table, if there is not an ORDER BY to enforce selection of a deterministic subset. Again, this is not a bug; determinism of the results is simply not guaranteed in such a case.
For example, if PostgreSQL scans sequentially it will give you the first matching row. But what if decides a parallel scan is best? Now multiple workers are scanning different parts of the table at the same time; PostgreSQL will return the first one that matches.
Another example is synchronized seq scans.
This allows sequential scans of large tables to synchronize with each other, so that concurrent scans read the same block at about the same time and hence share the I/O workload. When this is enabled, a scan might start in the middle of the table and then “wrap around” the end to cover all rows, so as to synchronize with the activity of scans already in progress. This can result in unpredictable changes in the row ordering returned by queries that have no ORDER BY clause. Setting this parameter to off ensures the pre-8.3 behavior in which a sequential scan always starts from the beginning of the table. The default is on.
If you want to guarantee an order, you have to specify it.
WITH things AS (
-- Create CTE Here ---
-- ...
)
SELECT id
FROM things
WHERE condition_1 = 't'
ORDER BY sort_1, sort_2 NULLS LAST, sort_3
LIMIT 1
UNION
SELECT id
FROM things
WHERE condition_2 = 't'
ORDER BY sort_1, sort_2 NULLS LAST, sort_3
LIMIT 1;