Postgres FDW join query on local and foreign table is slow with ORDER BY and JSONB data-CodePudding

I have two tables the local table debtors and the foreign table debtor_registry. I'm using PostgreSQL v13.

My problem here is whenever I try the following query, It takes 14secs to get the 1000 records.

SELECT 
    debtors.id,
    debtors.name,
    debtor_registry.settings
FROM debtors
    INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
ORDER BY name LIMIT 1000 OFFSET 0;

I was surprised that whenever I removed the ORDER BY clause from the query, It become faster only took 194ms for 1000 records.

SELECT 
    debtors.id,
    debtors.name,
    debtor_registry.settings
FROM debtors
    INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
LIMIT 1000 OFFSET 0;

Also, another case is if I remove the settings which is a JSONB field from the query, and retain the ORDER BY clause. It only took 101ms to get 1000 records.

SELECT 
    debtors.id,
    debtors.name
FROM debtors
    INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
ORDER BY name LIMIT 1000 OFFSET 0;

I'm suspecting that It might be related to how much data I am trying to get.

Here is the EXPLAIN ANALYZE VERBOSE result if the settings JSONB fields, ORDER BY name and LIMIT 1000 are in the query:

Limit  (cost=114722.78..114725.28 rows=1000 width=57) (actual time=13712.125..14002.827 rows=1000 loops=1)
  Output: debtors.id, debtors.name, debtor_registry.settings
  ->  Sort  (cost=114722.78..114725.63 rows=1140 width=57) (actual time=13703.171..13993.617 rows=1000 loops=1)
        Output: debtors.id, debtors.name, debtor_registry.settings
        Sort Key: debtors.name
        Sort Method: external merge  Disk: 82752kB
        ->  Hash Join  (cost=896.60..114664.90 rows=1140 width=57) (actual time=14.889..917.360 rows=10550 loops=1)
              Output: debtors.id, debtors.name, debtor_registry.settings
              Hash Cond: (((debtor_registry.id)::character varying)::text = (debtors.registry_uuid)::text)
              ->  Foreign Scan on public.debtor_registry  (cost=100.00..113832.74 rows=1137 width=48) (actual time=8.845..902.466 rows=10529 loops=1)
                    Output: debtor_registry.id, debtor_registry.company_id, debtor_registry.settings, debtor_registry.product
                    Remote SQL: SELECT id, settings FROM public.company_debtor
              ->  Hash  (cost=664.60..664.60 rows=10560 width=62) (actual time=6.027..6.028 rows=10554 loops=1)
                    Output: debtors.id, debtors.name, debtors.registry_uuid
                    Buckets: 16384  Batches: 1  Memory Usage: 1108kB
                    ->  Seq Scan on public.debtors  (cost=0.00..664.60 rows=10560 width=62) (actual time=0.019..4.726 rows=10560 loops=1)
                          Output: debtors.id, debtors.name, debtors.registry_uuid
Planning Time: 0.098 ms
JIT:
  Functions: 10
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 1.609 ms, Inlining 0.000 ms, Optimization 0.674 ms, Emission 7.991 ms, Total 10.274 ms
Execution Time: 14007.113 ms

How can I make the 1st query faster without omitting the settings field and the ORDER BY name clause and LIMIT 1000?

UPDATE

I also found this similar question but the answer does not solve my problem. Since our sorting is dynamic and we build queries based on the frontend client request.
Setting use_remote_estimate to 'true' doesn't help either. :(

CodePudding user response：

Try

with t as materialized
(
 SELECT -- your second query as-is
    debtors.id,
    debtors.name,
    debtor_registry.settings
 FROM debtors
    INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
 LIMIT 1000 OFFSET 0
)
select * from t ORDER BY name;

i.e. keep the plan of the quick second query and order the resultset after that. If your Postgresql version is before 12 then omit materialized as CTEs are always materialized.

Second suggestion - sort/limit locally, pick the right records upfront and then pull fat debtor_registry.settings for only 1000 records.

with t as materialized
(
 SELECT d.id, d.name, d.registry_uuid 
 FROM debtors d
 ORDER BY d.name
 LIMIT 1000 OFFSET 0
)
select t.id, t.name, debtor_registry.settings
FROM t INNER JOIN debtor_registry ON debtor_registry.id = t.registry_uuid
ORDER BY t.name;