Why would LIMIT 2 queries work but LIMIT 1 always times out?-CodePudding

I'm using this public Postgres DB of NEAR protocol: https://github.com/near/near-indexer-for-explorer#shared-public-access

postgres://public_readonly:[email protected]/mainnet_explorer

SELECT "public"."receipts"."receipt_id",
    "public"."receipts"."included_in_block_hash",
    "public"."receipts"."included_in_chunk_hash",
    "public"."receipts"."index_in_chunk",
    "public"."receipts"."included_in_block_timestamp",
    "public"."receipts"."predecessor_account_id",
    "public"."receipts"."receiver_account_id",
    "public"."receipts"."receipt_kind",
    "public"."receipts"."originated_from_transaction_hash"
FROM "public"."receipts"
WHERE ("public"."receipts"."receipt_id") IN
        (SELECT "t0"."receipt_id"
            FROM "public"."receipts" AS "t0"
            INNER JOIN "public"."action_receipts" AS "j0" ON ("j0"."receipt_id") = ("t0"."receipt_id")
            WHERE ("j0"."signer_account_id" = 'ryancwalsh.near'
                                        AND "t0"."receipt_id" IS NOT NULL))
ORDER BY "public"."receipts"."included_in_block_timestamp" DESC
LIMIT 1
OFFSET 0

always results in:

ERROR:  canceling statement due to statement timeout
SQL state: 57014

But if I change it to LIMIT 2, the query runs in less than 1 second.

How would that ever be the case? Does that mean the database isn't set up well? Or am I doing something wrong?

P.S. The query here was generated via Prisma. findFirst always times out, so I think I might need to change it to findMany as a workaround.

CodePudding user response：

Your query can be simplified /optimized:

SELECT r.receipt_id
     , r.included_in_block_hash
     , r.included_in_chunk_hash
     , r.index_in_chunk
     , r.included_in_block_timestamp
     , r.predecessor_account_id
     , r.receiver_account_id
     , r.receipt_kind
     , r.originated_from_transaction_hash
FROM   public.receipts r
WHERE  EXISTS (
   SELECT FROM public.action_receipts j
   WHERE  j.receipt_id = r.receipt_id
   AND    j.signer_account_id = 'ryancwalsh.near'
   )
ORDER  BY r.included_in_block_timestamp DESC
LIMIT  1;

However, that only scratches the surface of your underlying problem.

Like Kirk already commented, Postgres chooses a different query plan for LIMIT 1, obviously ignorant of the fact that there are only 90 rows in table action_receipts with signer_account_id = 'ryancwalsh.near', while both involved tables have more than 220 million rows, obviously growing steadily.

Changing to LIMIT 2 makes a different query plan seem more favorable, hence the observed difference in performance. (So the query planner has the general idea that the filter is very selective, just not close enough for the corner case of LIMIT 1.)

You should have mentioned cardinalities to set us on the right track.

Knowing that our filter is so selective, we can force a more favorable query plan with a different query:

WITH j AS (
   SELECT receipt_id  -- is PK!
   FROM   public.action_receipts
   WHERE  signer_account_id = 'ryancwalsh.near'
   )
SELECT r.receipt_id
     , r.included_in_block_hash
     , r.included_in_chunk_hash
     , r.index_in_chunk
     , r.included_in_block_timestamp
     , r.predecessor_account_id
     , r.receiver_account_id
     , r.receipt_kind
     , r.originated_from_transaction_hash
FROM   j
JOIN   public.receipts r USING (receipt_id)
ORDER  BY r.included_in_block_timestamp DESC
LIMIT  1;

This uses the same query plan for LIMIT 2, and either finishes in under 2 ms in my test:

                                                                         QUERY PLAN                                                                         
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=134904.89..134904.89 rows=1 width=223) (actual time=1.750..1.754 rows=1 loops=1)
   CTE j
     ->  Bitmap Heap Scan on action_receipts  (cost=319.46..41564.59 rows=10696 width=44) (actual time=0.058..0.179 rows=90 loops=1)
           Recheck Cond: (signer_account_id = 'ryancwalsh.near'::text)
           Heap Blocks: exact=73
           ->  Bitmap Index Scan on action_receipt_signer_account_id_idx  (cost=0.00..316.79 rows=10696 width=0) (actual time=0.043..0.043 rows=90 loops=1)
                 Index Cond: (signer_account_id = 'ryancwalsh.near'::text)
   ->  Sort  (cost=93340.30..93367.04 rows=10696 width=223) (actual time=1.749..1.750 rows=1 loops=1)
         Sort Key: r.included_in_block_timestamp DESC
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Nested Loop  (cost=0.70..93286.82 rows=10696 width=223) (actual time=0.089..1.705 rows=90 loops=1)
               ->  CTE Scan on j  (cost=0.00..213.92 rows=10696 width=32) (actual time=0.060..0.221 rows=90 loops=1)
               ->  Index Scan using receipts_pkey on receipts r  (cost=0.70..8.70 rows=1 width=223) (actual time=0.016..0.016 rows=1 loops=90)
                     Index Cond: (receipt_id = j.receipt_id)
 Planning Time: 0.281 ms
 Execution Time: 1.801 ms

The point is, that the DB at hand runs Postgres 11, where CTEs are always optimization barriers, so our hugely selective query is executed first, and Postgres does not attempt to walk the index on (included_in_block_timestamp) under the wrong assumption that it would find a matching row soon enough. (It does not.)

In Postgres 12 or later you would add AS MATERIALIZED to the CTE to guarantee the same effect.

Or you could use the "OFFSET 0 hack" in any version like so:

SELECT r.receipt_id
     , r.included_in_block_hash
     , r.included_in_chunk_hash
     , r.index_in_chunk
     , r.included_in_block_timestamp
     , r.predecessor_account_id
     , r.receiver_account_id
     , r.receipt_kind
     , r.originated_from_transaction_hash
FROM  (
   SELECT receipt_id  -- is PK!
   FROM   public.action_receipts
   WHERE  signer_account_id = 'ryancwalsh.near'
   OFFSET 0
   ) j
JOIN   public.receipts r USING (receipt_id)
ORDER  BY r.included_in_block_timestamp DESC
LIMIT  1;

Prevents "inlining" of the subquery to the same result. Finishes in < 2ms.

See:

How to prevent PostgreSQL from rewriting OUTER JOIN queries?

"Fix" the database?

The proper fix depends on the complete picture. The underlying problem is that Postgres overestimates the number of qualifying rows in table action_receipts. The MCV list cannot keep up with 220 million rows (and growing). It's most probably not just ANALYZE lagging behind. (Though it could be: autovacuum not properly configured? Rooky mistake?) Depending on the actual cardinalities (data distribution) in action_receipts.signer_account_id and access patterns you could do various things to "fix" it. Two options:

1. Increase `n_distinct` and `STATISTICS`

If most values in action_receipts.signer_account_id are equally rare (high cardinality), consider setting a very large n_distinct value for the columns. And combine that with a moderately increased STATISTICS target for the column to counter errors in the other direction (underestimating the number of qualifying rows for actually more common values). Read both answers here:

Postgres sometimes uses inferior index for WHERE a IN (...) ORDER BY b LIMIT N

And:

Very bad query plan in PostgreSQL 9.6

2. Local fix with partial index

If action_receipts.signer_account_id = 'ryancwalsh.near' is special in that it's queried more regularly than others, consider a small partial index for it, to fix just that case. Like:

CREATE INDEX ON action_receipts (receipt_id)
WHERE signer_account_id = 'ryancwalsh.near';

"Fix" the database?

1. Increase n_distinct and STATISTICS

2. Local fix with partial index

1. Increase `n_distinct` and `STATISTICS`