Given the following DDL
CREATE TABLE numbers (
id BIGSERIAL PRIMARY KEY,
number BIGINT
);
CREATE INDEX ON numbers (number);
In PostgreSQL 13 I can use the following query:
SELECT *
FROM numbers
WHERE number > 1
ORDER BY number
FETCH FIRST 1000 ROWS WITH TIES
It produces a very effective query plan and performs well enough with large tables:
Limit (cost=...)
-> Index Scan using numbers_number_idx on numbers (cost=...)
Index Cond: (number > 1)
Is there an equivalent for that in PostgreSQL 11 that gives a similar query plan and comparable performance for large (10TB ) tables?
This answer suggests to use the following query:
WITH cte AS (
SELECT *, RANK() OVER (ORDER BY number) AS rnk
FROM numbers
WHERE number > 1
)
SELECT *
FROM cte
WHERE rnk <= 1000;
But it is not really usable for large tables because performance is many times worse.
Subquery Scan on cte (cost=...)
Filter: (cte.rnk <= 1000)
-> WindowAgg (cost=...)
-> Index Scan using numbers_number_idx on numbers (cost=...)
Index Cond: (number > 1)
CodePudding user response:
You won't get the performance of the extremely convenient WITH TIES
in Postgres 13 or later. See:
But there are always options ...
Faster SQL variant
Should be faster for big tables because it avoids to rank the whole table like the simple solution in the referenced answer.
WITH cte AS (
SELECT *, row_number() OVER (ORDER BY number, id) AS rn -- ORDER must match
FROM numbers
WHERE number > 1
ORDER BY number, id -- tiebreaker to make it deterministic
LIMIT 1000
)
TABLE cte
UNION ALL
(
SELECT n.*, 1000 row_number() OVER (ORDER BY n.id) AS rn
FROM numbers n, (SELECT number, id FROM cte WHERE rn = 1000) AS max
WHERE n.number = max.number
AND n.id > max.id
ORDER BY n.id
);
id
is any UNIQUE NOT NULL
column in your table that lends itself as tiebreaker, typically the PRIMARY KEY
.
Sort by that additionally, which has the (maybe welcome) side effect that you get a deterministically ordered result.
Ideally, you have a multicolumn index on (number, id)
for this. But it will use the existing index on just (number)
, too. Resulting performance depends on cardinalities and data types.
Related:
Since the CTE query counts as one command, there is no race condition under concurrent write load. All parts of the command see the same snapshot of the table in default isolation level READ COMMITTED
.
I might wrap this into a simple SQL function (or prepared statement) and pass LIMIT
to it for convenience.
Alternative with PL/pgSQL
Only running a single SELECT
probably outweighs the added overhead. Works optimally with the existing index on just (number)
:
CREATE OR REPLACE FUNCTION foo(_bound int, _limit int, OUT _row public.numbers)
RETURNS SETOF public.numbers
LANGUAGE plpgsql AS
$func$
DECLARE
_ct int := 0;
_number int; -- match type!
BEGIN
FOR _row IN
SELECT *
FROM public.numbers
WHERE number > _bound
ORDER BY number
LOOP
_ct := _ct 1;
CASE
WHEN _ct < _limit THEN
-- do nothing
WHEN _ct > _limit THEN
EXIT WHEN _row.number > _number;
WHEN _ct = _limit THEN
_number := _row.number;
END CASE;
RETURN NEXT;
END LOOP;
END
$func$;
But it's tailored for just the one query. Gets tedious for varying queries.
CodePudding user response:
I got pretty good results with the following select query. Not as fast as the other answer, but the query is also a lot simpler. the offset value has to be 1 less than the number of rows desired (999 for 1000) since offset starts at 0.
-- query 1
select *
from numbers
where number <= (
select number
from numbers
where number > 1
order by number
offset 999 limit 1
)
and number > 1
Performance
test was run with the table as defined in the question, and the following insert statement.
INSERT INTO numbers (n)
SELECT floor(random() * 100000000)
FROM generate_series(1, 100000000);
I'll admit its not very scientific, as this is done on my puny windows 10 laptop & running on a postgresql 13 cluster on wsl with all default settings on battery power. i didn't restart the server between runs or do anything to prevent optimization due to data present in cache.
Explain analyze results, where query 1 refers to the query above, query 2 refers to the simple query in question, query 3 refers to query in other answer and query 4 is the elegant with ties
variant.
query | execution time | factor slower |
---|---|---|
1 | 80.642 ms | 63.6 |
2 | 142142.026 ms | 112011 |
3 | 59.551 ms | 46.9 |
4 | 1.269 ms | 1 |
Explain analyze query plans
-- query 1
Gather (cost=10965.26..1178206.10 rows=500000 width=16) (actual time=64.157..79.186 rows=1000 loops=1)
Workers Planned: 2
Params Evaluated: $0
Workers Launched: 2
InitPlan 1 (returns $0)
-> Limit (cost=27.66..27.69 rows=1 width=8) (actual time=63.613..63.615 rows=1 loops=1)
-> Index Only Scan using numbers_n_idx on numbers numbers_1 (cost=0.57..2712066.05 rows=100000085 width=8) (actual time=0.029..0.318 rows=1000 loops=1)
Index Cond: (number > 1)
Heap Fetches: 0
-> Parallel Bitmap Heap Scan on numbers (cost=9937.57..1127178.41 rows=208333 width=16) (actual time=0.090..0.498 rows=333 loops=3)
Recheck Cond: ((number <= $0) AND (number > 1))
Heap Blocks: exact=1000
-> Bitmap Index Scan on numbers_n_idx (cost=0.00..9812.57 rows=500000 width=0) (actual time=0.135..0.135 rows=1000 loops=1)
Index Cond: ((number <= $0) AND (number > 1))
Planning Time: 0.230 ms
JIT:
Functions: 18
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.936 ms, Inlining 44.863 ms, Optimization 12.045 ms, Emission 6.182 ms, Total 65.026 ms
Execution Time: 80.642 ms
(20 rows)
-- query 2
Subquery Scan on cte (cost=8472029.70..22868689.85 rows=33333362 width=24) (actual time=40001.966..142074.323 rows=1000 loops=1)
Filter: (cte.rnk <= 1000)
Rows Removed by Filter: 99998999
-> WindowAgg (cost=8472029.70..21618688.79 rows=100000085 width=24) (actual time=40001.964..129209.141 rows=99999999 loops=1)
-> Gather Merge (cost=8472029.70..20118687.52 rows=100000085 width=16) (actual time=40001.921..75176.460 rows=99999999 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=8471029.68..8575196.43 rows=41666702 width=16) (actual time=39707.316..46741.888 rows=33333333 loops=3)
Sort Key: numbers.number
Sort Method: external merge Disk: 856888kB
Worker 0: Sort Method: external merge Disk: 839696kB
Worker 1: Sort Method: external merge Disk: 847688kB
-> Parallel Seq Scan on numbers (cost=0.00..1061374.79 rows=41666702 width=16) (actual time=60.334..13404.824 rows=33333333 loops=3)
Filter: (number > 1)
Rows Removed by Filter: 0
Planning Time: 0.179 ms
JIT:
Functions: 16
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.773 ms, Inlining 90.193 ms, Optimization 55.148 ms, Emission 34.709 ms, Total 181.823 ms
Execution Time: 142142.026 ms
(21 rows)
-- query 3
Append (cost=2497.62..2628.54 rows=1005 width=24) (actual time=12.881..59.360 rows=1000 loops=1)
CTE cte
-> Limit (cost=1004.33..2497.62 rows=1000 width=24) (actual time=12.877..58.683 rows=1000 loops=1)
-> WindowAgg (cost=1004.33..149329948.33 rows=100000085 width=24) (actual time=12.875..58.435 rows=1000 loops=1)
-> Gather Merge (cost=1004.33..147579946.84 rows=100000085 width=16) (actual time=12.858..57.988 rows=1001 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Incremental Sort (cost=4.31..136036455.76 rows=41666702 width=16) (actual time=0.147..29.746 rows=366 loops=3)
Sort Key: numbers.number, numbers.id
Presorted Key: numbers.number
Full-sort Groups: 13 Sort Method: quicksort Average Memory: 26kB Peak Memory: 26kB
Worker 0: Full-sort Groups: 15 Sort Method: quicksort Average Memory: 26kB Peak Memory: 26kB
Worker 1: Full-sort Groups: 8 Sort Method: quicksort Average Memory: 26kB Peak Memory: 26kB
-> Parallel Index Scan using numbers_n_idx on numbers (cost=0.57..134359882.50 rows=41666702 width=16) (actual time=0.042..29.576 rows=393 loops=3)
Index Cond: (number > 1)
-> CTE Scan on cte (cost=0.00..20.00 rows=1000 width=24) (actual time=12.880..14.618 rows=1000 loops=1)
-> WindowAgg (cost=105.75..105.85 rows=5 width=24) (actual time=0.079..0.081 rows=0 loops=1)
-> Sort (cost=105.75..105.76 rows=5 width=16) (actual time=0.077..0.079 rows=0 loops=1)
Sort Key: n.id
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.57..105.69 rows=5 width=16) (actual time=0.070..0.071 rows=0 loops=1)
-> CTE Scan on cte cte_1 (cost=0.00..22.50 rows=5 width=16) (actual time=0.051..0.051 rows=1 loops=1)
Filter: (rn = 1000)
Rows Removed by Filter: 999
-> Index Scan using numbers_n_idx on numbers n (cost=0.57..16.63 rows=1 width=16) (actual time=0.015..0.016 rows=0 loops=1)
Index Cond: (number = cte_1.number)
Filter: (id > cte_1.id)
Rows Removed by Filter: 1
Planning Time: 0.202 ms
Execution Time: 59.551 ms
(30 rows)
-- query 4
Limit (cost=0.57..1350.00 rows=1000 width=16) (actual time=0.013..1.113 rows=1000 loops=1)
-> Index Scan using numbers_n_idx on numbers (cost=0.57..134943216.33 rows=100000085 width=16) (actual time=0.012..0.869 rows=1001 loops=1)
Index Cond: (number > 1)
Planning Time: 0.142 ms
Execution Time: 1.269 ms
(5 rows)