I have two tables the local table debtors
and the foreign table debtor_registry
. I'm using PostgreSQL v13.
My problem here is whenever I try the following query, It takes 14secs to get the 1000 records.
SELECT
debtors.id,
debtors.name,
debtor_registry.settings
FROM debtors
INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
ORDER BY name LIMIT 1000 OFFSET 0;
I was surprised that whenever I removed the ORDER BY
clause from the query, It become faster only took 194ms for 1000 records.
SELECT
debtors.id,
debtors.name,
debtor_registry.settings
FROM debtors
INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
LIMIT 1000 OFFSET 0;
Also, another case is if I remove the settings
which is a JSONB field from the query, and retain the ORDER BY
clause. It only took 101ms to get 1000 records.
SELECT
debtors.id,
debtors.name
FROM debtors
INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
ORDER BY name LIMIT 1000 OFFSET 0;
I'm suspecting that It might be related to how much data I am trying to get.
Here is the EXPLAIN ANALYZE VERBOSE
result if the settings
JSONB fields, ORDER BY name
and LIMIT 1000
are in the query:
Limit (cost=114722.78..114725.28 rows=1000 width=57) (actual time=13712.125..14002.827 rows=1000 loops=1)
Output: debtors.id, debtors.name, debtor_registry.settings
-> Sort (cost=114722.78..114725.63 rows=1140 width=57) (actual time=13703.171..13993.617 rows=1000 loops=1)
Output: debtors.id, debtors.name, debtor_registry.settings
Sort Key: debtors.name
Sort Method: external merge Disk: 82752kB
-> Hash Join (cost=896.60..114664.90 rows=1140 width=57) (actual time=14.889..917.360 rows=10550 loops=1)
Output: debtors.id, debtors.name, debtor_registry.settings
Hash Cond: (((debtor_registry.id)::character varying)::text = (debtors.registry_uuid)::text)
-> Foreign Scan on public.debtor_registry (cost=100.00..113832.74 rows=1137 width=48) (actual time=8.845..902.466 rows=10529 loops=1)
Output: debtor_registry.id, debtor_registry.company_id, debtor_registry.settings, debtor_registry.product
Remote SQL: SELECT id, settings FROM public.company_debtor
-> Hash (cost=664.60..664.60 rows=10560 width=62) (actual time=6.027..6.028 rows=10554 loops=1)
Output: debtors.id, debtors.name, debtors.registry_uuid
Buckets: 16384 Batches: 1 Memory Usage: 1108kB
-> Seq Scan on public.debtors (cost=0.00..664.60 rows=10560 width=62) (actual time=0.019..4.726 rows=10560 loops=1)
Output: debtors.id, debtors.name, debtors.registry_uuid
Planning Time: 0.098 ms
JIT:
Functions: 10
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.609 ms, Inlining 0.000 ms, Optimization 0.674 ms, Emission 7.991 ms, Total 10.274 ms
Execution Time: 14007.113 ms
How can I make the 1st query faster without omitting the settings
field and the ORDER BY name
clause and LIMIT 1000
?
UPDATE
I also found this similar question but the answer does not solve my problem. Since our sorting is dynamic and we build queries based on the frontend client request.
Setting
use_remote_estimate
to 'true' doesn't help either. :(
CodePudding user response:
Try
with t as materialized
(
SELECT -- your second query as-is
debtors.id,
debtors.name,
debtor_registry.settings
FROM debtors
INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
LIMIT 1000 OFFSET 0
)
select * from t ORDER BY name;
i.e. keep the plan of the quick second query and order the resultset after that.
If your Postgresql version is before 12 then omit materialized
as CTEs are always materialized.
Second suggestion - sort/limit locally, pick the right records upfront and then pull fat debtor_registry.settings
for only 1000 records.
with t as materialized
(
SELECT d.id, d.name, d.registry_uuid
FROM debtors d
ORDER BY d.name
LIMIT 1000 OFFSET 0
)
select t.id, t.name, debtor_registry.settings
FROM t INNER JOIN debtor_registry ON debtor_registry.id = t.registry_uuid
ORDER BY t.name;