I'm using this public Postgres DB of NEAR protocol: https://github.com/near/near-indexer-for-explorer#shared-public-access
postgres://public_readonly:[email protected]/mainnet_explorer
SELECT "public"."receipts"."receipt_id",
"public"."receipts"."included_in_block_hash",
"public"."receipts"."included_in_chunk_hash",
"public"."receipts"."index_in_chunk",
"public"."receipts"."included_in_block_timestamp",
"public"."receipts"."predecessor_account_id",
"public"."receipts"."receiver_account_id",
"public"."receipts"."receipt_kind",
"public"."receipts"."originated_from_transaction_hash"
FROM "public"."receipts"
WHERE ("public"."receipts"."receipt_id") IN
(SELECT "t0"."receipt_id"
FROM "public"."receipts" AS "t0"
INNER JOIN "public"."action_receipts" AS "j0" ON ("j0"."receipt_id") = ("t0"."receipt_id")
WHERE ("j0"."signer_account_id" = 'ryancwalsh.near'
AND "t0"."receipt_id" IS NOT NULL))
ORDER BY "public"."receipts"."included_in_block_timestamp" DESC
LIMIT 1
OFFSET 0
always results in:
ERROR: canceling statement due to statement timeout
SQL state: 57014
But if I change it to LIMIT 2, the query runs in less than 1 second.
How would that ever be the case? Does that mean the database isn't set up well? Or am I doing something wrong?
P.S. The query here was generated via Prisma. findFirst
always times out, so I think I might need to change it to findMany
as a workaround.
CodePudding user response:
Your query can be simplified /optimized:
SELECT r.receipt_id
, r.included_in_block_hash
, r.included_in_chunk_hash
, r.index_in_chunk
, r.included_in_block_timestamp
, r.predecessor_account_id
, r.receiver_account_id
, r.receipt_kind
, r.originated_from_transaction_hash
FROM public.receipts r
WHERE EXISTS (
SELECT FROM public.action_receipts j
WHERE j.receipt_id = r.receipt_id
AND j.signer_account_id = 'ryancwalsh.near'
)
ORDER BY r.included_in_block_timestamp DESC
LIMIT 1;
However, that only scratches the surface of your underlying problem.
Like Kirk already commented, Postgres chooses a different query plan for LIMIT 1
, obviously ignorant of the fact that there are only 90 rows in table action_receipts
with signer_account_id = 'ryancwalsh.near'
, while both involved tables have more than 220 million rows, obviously growing steadily.
Changing to LIMIT 2
makes a different query plan seem more favorable, hence the observed difference in performance. (So the query planner has the general idea that the filter is very selective, just not close enough for the corner case of LIMIT 1
.)
You should have mentioned cardinalities to set us on the right track.
Knowing that our filter is so selective, we can force a more favorable query plan with a different query:
WITH j AS (
SELECT receipt_id -- is PK!
FROM public.action_receipts
WHERE signer_account_id = 'ryancwalsh.near'
)
SELECT r.receipt_id
, r.included_in_block_hash
, r.included_in_chunk_hash
, r.index_in_chunk
, r.included_in_block_timestamp
, r.predecessor_account_id
, r.receiver_account_id
, r.receipt_kind
, r.originated_from_transaction_hash
FROM j
JOIN public.receipts r USING (receipt_id)
ORDER BY r.included_in_block_timestamp DESC
LIMIT 1;
This uses the same query plan for LIMIT 2
, and either finishes in under 2 ms in my test:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=134904.89..134904.89 rows=1 width=223) (actual time=1.750..1.754 rows=1 loops=1)
CTE j
-> Bitmap Heap Scan on action_receipts (cost=319.46..41564.59 rows=10696 width=44) (actual time=0.058..0.179 rows=90 loops=1)
Recheck Cond: (signer_account_id = 'ryancwalsh.near'::text)
Heap Blocks: exact=73
-> Bitmap Index Scan on action_receipt_signer_account_id_idx (cost=0.00..316.79 rows=10696 width=0) (actual time=0.043..0.043 rows=90 loops=1)
Index Cond: (signer_account_id = 'ryancwalsh.near'::text)
-> Sort (cost=93340.30..93367.04 rows=10696 width=223) (actual time=1.749..1.750 rows=1 loops=1)
Sort Key: r.included_in_block_timestamp DESC
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop (cost=0.70..93286.82 rows=10696 width=223) (actual time=0.089..1.705 rows=90 loops=1)
-> CTE Scan on j (cost=0.00..213.92 rows=10696 width=32) (actual time=0.060..0.221 rows=90 loops=1)
-> Index Scan using receipts_pkey on receipts r (cost=0.70..8.70 rows=1 width=223) (actual time=0.016..0.016 rows=1 loops=90)
Index Cond: (receipt_id = j.receipt_id)
Planning Time: 0.281 ms
Execution Time: 1.801 ms
The point is, that the DB at hand runs Postgres 11, where CTEs are always optimization barriers, so our hugely selective query is executed first, and Postgres does not attempt to walk the index on (included_in_block_timestamp)
under the wrong assumption that it would find a matching row soon enough. (It does not.)
In Postgres 12 or later you would add AS MATERIALIZED
to the CTE to guarantee the same effect.
Or you could use the "OFFSET 0 hack" in any version like so:
SELECT r.receipt_id
, r.included_in_block_hash
, r.included_in_chunk_hash
, r.index_in_chunk
, r.included_in_block_timestamp
, r.predecessor_account_id
, r.receiver_account_id
, r.receipt_kind
, r.originated_from_transaction_hash
FROM (
SELECT receipt_id -- is PK!
FROM public.action_receipts
WHERE signer_account_id = 'ryancwalsh.near'
OFFSET 0
) j
JOIN public.receipts r USING (receipt_id)
ORDER BY r.included_in_block_timestamp DESC
LIMIT 1;
Prevents "inlining" of the subquery to the same result. Finishes in < 2ms.
See:
"Fix" the database?
The proper fix depends on the complete picture. The underlying problem is that Postgres overestimates the number of qualifying rows in table action_receipts
. The MCV list cannot keep up with 220 million rows (and growing). It's most probably not just ANALYZE
lagging behind. (Though it could be: autovacuum
not properly configured? Rooky mistake?) Depending on the actual cardinalities (data distribution) in action_receipts.signer_account_id
and access patterns you could do various things to "fix" it. Two options:
1. Increase n_distinct
and STATISTICS
If most values in action_receipts.signer_account_id
are equally rare (high cardinality), consider setting a very large n_distinct
value for the columns. And combine that with a moderately increased STATISTICS
target for the column to counter errors in the other direction (underestimating the number of qualifying rows for actually more common values). Read both answers here:
And:
2. Local fix with partial index
If action_receipts.signer_account_id = 'ryancwalsh.near'
is special in that it's queried more regularly than others, consider a small partial index for it, to fix just that case. Like:
CREATE INDEX ON action_receipts (receipt_id)
WHERE signer_account_id = 'ryancwalsh.near';