Equivalent for FETCH FIRST WITH TIES in Postgres 11 with comparable performance-CodePudding

Given the following DDL

CREATE TABLE numbers (
    id     BIGSERIAL PRIMARY KEY,
    number BIGINT
);
CREATE INDEX ON numbers (number);

In PostgreSQL 13 I can use the following query:

SELECT *
FROM numbers
WHERE number > 1
ORDER BY number
FETCH FIRST 1000 ROWS WITH TIES

It produces a very effective query plan and performs well enough with large tables:

Limit  (cost=...)                                                   
  ->  Index Scan using numbers_number_idx on numbers  (cost=...) 
        Index Cond: (number > 1)

Is there an equivalent for that in PostgreSQL 11 that gives a similar query plan and comparable performance for large (10TB ) tables?

This answer suggests to use the following query:

WITH cte AS (
    SELECT *, RANK() OVER (ORDER BY number) AS rnk
    FROM numbers
    WHERE number > 1
)
SELECT *
FROM cte
WHERE rnk <= 1000;

But it is not really usable for large tables because performance is many times worse.

Subquery Scan on cte  (cost=...)                                       
  Filter: (cte.rnk <= 1000)                                                                       
  ->  WindowAgg  (cost=...)                                            
        ->  Index Scan using numbers_number_idx on numbers  (cost=...) 
              Index Cond: (number > 1)

CodePudding user response：

You won't get the performance of the extremely convenient WITH TIES in Postgres 13 or later. See:

Greater than or equal to ALL() and equal to MAX() speed

But there are always options ...

Faster SQL variant

Should be faster for big tables because it avoids to rank the whole table like the simple solution in the referenced answer.

WITH cte AS (
   SELECT *, row_number() OVER (ORDER  BY number, id) AS rn  -- ORDER must match
   FROM   numbers
   WHERE  number > 1
   ORDER  BY number, id  -- tiebreaker to make it deterministic
   LIMIT  1000
   )
TABLE cte
UNION ALL
(
SELECT n.*, 1000   row_number() OVER (ORDER BY n.id) AS rn
FROM   numbers n, (SELECT number, id FROM cte WHERE rn = 1000) AS max
WHERE  n.number = max.number
AND    n.id > max.id
ORDER  BY n.id
);

id is any UNIQUE NOT NULL column in your table that lends itself as tiebreaker, typically the PRIMARY KEY.

Sort by that additionally, which has the (maybe welcome) side effect that you get a deterministically ordered result.

Ideally, you have a multicolumn index on (number, id) for this. But it will use the existing index on just (number), too. Resulting performance depends on cardinalities and data types.
Related:

Is a composite index also good for queries on the first field?

Since the CTE query counts as one command, there is no race condition under concurrent write load. All parts of the command see the same snapshot of the table in default isolation level READ COMMITTED.

I might wrap this into a simple SQL function (or prepared statement) and pass LIMIT to it for convenience.

Alternative with PL/pgSQL

Only running a single SELECT probably outweighs the added overhead. Works optimally with the existing index on just (number):

CREATE OR REPLACE FUNCTION foo(_bound int, _limit int, OUT _row public.numbers)
  RETURNS SETOF public.numbers
  LANGUAGE plpgsql AS
$func$
DECLARE
   _ct int := 0;
   _number int;  -- match type!
BEGIN
   FOR _row IN
      SELECT *
      FROM   public.numbers
      WHERE  number > _bound
      ORDER  BY number
   LOOP
      _ct := _ct   1;
      CASE 
      WHEN _ct < _limit THEN
         -- do nothing
      WHEN _ct > _limit THEN
         EXIT WHEN _row.number > _number;
      WHEN _ct = _limit THEN
         _number := _row.number;   
      END CASE;

      RETURN NEXT;
   END LOOP;
END
$func$;

But it's tailored for just the one query. Gets tedious for varying queries.

CodePudding user response：

I got pretty good results with the following select query. Not as fast as the other answer, but the query is also a lot simpler. the offset value has to be 1 less than the number of rows desired (999 for 1000) since offset starts at 0.

-- query 1
select *
from numbers 
where number <= (
  select number 
  from numbers
  where number > 1
  order by number
  offset 999 limit 1
  ) 
  and number > 1

Performance

test was run with the table as defined in the question, and the following insert statement.

INSERT INTO numbers (n) 
SELECT floor(random() * 100000000) 
FROM generate_series(1, 100000000);

I'll admit its not very scientific, as this is done on my puny windows 10 laptop & running on a postgresql 13 cluster on wsl with all default settings on battery power. i didn't restart the server between runs or do anything to prevent optimization due to data present in cache.

Explain analyze results, where query 1 refers to the query above, query 2 refers to the simple query in question, query 3 refers to query in other answer and query 4 is the elegant with ties variant.

query	execution time	factor slower
1	80.642 ms	63.6
2	142142.026 ms	112011
3	59.551 ms	46.9
4	1.269 ms	1

Explain analyze query plans

-- query 1
 Gather  (cost=10965.26..1178206.10 rows=500000 width=16) (actual time=64.157..79.186 rows=1000 loops=1)
   Workers Planned: 2
   Params Evaluated: $0
   Workers Launched: 2
   InitPlan 1 (returns $0)
     ->  Limit  (cost=27.66..27.69 rows=1 width=8) (actual time=63.613..63.615 rows=1 loops=1)
           ->  Index Only Scan using numbers_n_idx on numbers numbers_1  (cost=0.57..2712066.05 rows=100000085 width=8) (actual time=0.029..0.318 rows=1000 loops=1)
                 Index Cond: (number > 1)
                 Heap Fetches: 0
   ->  Parallel Bitmap Heap Scan on numbers  (cost=9937.57..1127178.41 rows=208333 width=16) (actual time=0.090..0.498 rows=333 loops=3)
         Recheck Cond: ((number <= $0) AND (number > 1))
         Heap Blocks: exact=1000
         ->  Bitmap Index Scan on numbers_n_idx  (cost=0.00..9812.57 rows=500000 width=0) (actual time=0.135..0.135 rows=1000 loops=1)
               Index Cond: ((number <= $0) AND (number > 1))
 Planning Time: 0.230 ms
 JIT:
   Functions: 18
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 1.936 ms, Inlining 44.863 ms, Optimization 12.045 ms, Emission 6.182 ms, Total 65.026 ms
 Execution Time: 80.642 ms
(20 rows)

-- query 2
 Subquery Scan on cte  (cost=8472029.70..22868689.85 rows=33333362 width=24) (actual time=40001.966..142074.323 rows=1000 loops=1)
   Filter: (cte.rnk <= 1000)
   Rows Removed by Filter: 99998999
   ->  WindowAgg  (cost=8472029.70..21618688.79 rows=100000085 width=24) (actual time=40001.964..129209.141 rows=99999999 loops=1)
         ->  Gather Merge  (cost=8472029.70..20118687.52 rows=100000085 width=16) (actual time=40001.921..75176.460 rows=99999999 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Sort  (cost=8471029.68..8575196.43 rows=41666702 width=16) (actual time=39707.316..46741.888 rows=33333333 loops=3)
                     Sort Key: numbers.number
                     Sort Method: external merge  Disk: 856888kB
                     Worker 0:  Sort Method: external merge  Disk: 839696kB
                     Worker 1:  Sort Method: external merge  Disk: 847688kB
                     ->  Parallel Seq Scan on numbers  (cost=0.00..1061374.79 rows=41666702 width=16) (actual time=60.334..13404.824 rows=33333333 loops=3)
                           Filter: (number > 1)
                           Rows Removed by Filter: 0
 Planning Time: 0.179 ms
 JIT:
   Functions: 16
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 1.773 ms, Inlining 90.193 ms, Optimization 55.148 ms, Emission 34.709 ms, Total 181.823 ms
 Execution Time: 142142.026 ms
(21 rows)

-- query 3
 Append  (cost=2497.62..2628.54 rows=1005 width=24) (actual time=12.881..59.360 rows=1000 loops=1)
   CTE cte
     ->  Limit  (cost=1004.33..2497.62 rows=1000 width=24) (actual time=12.877..58.683 rows=1000 loops=1)
           ->  WindowAgg  (cost=1004.33..149329948.33 rows=100000085 width=24) (actual time=12.875..58.435 rows=1000 loops=1)
                 ->  Gather Merge  (cost=1004.33..147579946.84 rows=100000085 width=16) (actual time=12.858..57.988 rows=1001 loops=1)
                       Workers Planned: 2
                       Workers Launched: 2
                       ->  Incremental Sort  (cost=4.31..136036455.76 rows=41666702 width=16) (actual time=0.147..29.746 rows=366 loops=3)
                             Sort Key: numbers.number, numbers.id
                             Presorted Key: numbers.number
                             Full-sort Groups: 13  Sort Method: quicksort  Average Memory: 26kB  Peak Memory: 26kB
                             Worker 0:  Full-sort Groups: 15  Sort Method: quicksort  Average Memory: 26kB  Peak Memory: 26kB
                             Worker 1:  Full-sort Groups: 8  Sort Method: quicksort  Average Memory: 26kB  Peak Memory: 26kB
                             ->  Parallel Index Scan using numbers_n_idx on numbers  (cost=0.57..134359882.50 rows=41666702 width=16) (actual time=0.042..29.576 rows=393 loops=3)
                                   Index Cond: (number > 1)
   ->  CTE Scan on cte  (cost=0.00..20.00 rows=1000 width=24) (actual time=12.880..14.618 rows=1000 loops=1)
   ->  WindowAgg  (cost=105.75..105.85 rows=5 width=24) (actual time=0.079..0.081 rows=0 loops=1)
         ->  Sort  (cost=105.75..105.76 rows=5 width=16) (actual time=0.077..0.079 rows=0 loops=1)
               Sort Key: n.id
               Sort Method: quicksort  Memory: 25kB
               ->  Nested Loop  (cost=0.57..105.69 rows=5 width=16) (actual time=0.070..0.071 rows=0 loops=1)
                     ->  CTE Scan on cte cte_1  (cost=0.00..22.50 rows=5 width=16) (actual time=0.051..0.051 rows=1 loops=1)
                           Filter: (rn = 1000)
                           Rows Removed by Filter: 999
                     ->  Index Scan using numbers_n_idx on numbers n  (cost=0.57..16.63 rows=1 width=16) (actual time=0.015..0.016 rows=0 loops=1)
                           Index Cond: (number = cte_1.number)
                           Filter: (id > cte_1.id)
                           Rows Removed by Filter: 1
 Planning Time: 0.202 ms
 Execution Time: 59.551 ms
(30 rows)

-- query 4
Limit  (cost=0.57..1350.00 rows=1000 width=16) (actual time=0.013..1.113 rows=1000 loops=1)
   ->  Index Scan using numbers_n_idx on numbers  (cost=0.57..134943216.33 rows=100000085 width=16) (actual time=0.012..0.869 rows=1001 loops=1)
         Index Cond: (number > 1)
 Planning Time: 0.142 ms
 Execution Time: 1.269 ms
(5 rows)