Home > OS >  Bad execution plan on Postgresql
Bad execution plan on Postgresql

Time:08-29

I'm trying to migrate from SQL Server to Postgresql. Here is my Posgresql code:

Create View person_names As
SELECT lp."Code", n."Name", n."Type"
from "Persons" lp
Left Join LATERAL
(
    Select *
    From "Names" n
    Where n.id = lp.id
    Order By "Date" desc
    Limit 1
) n on true
limit 100;

Explain
Select "Code" From person_names;

It prints

"Subquery Scan on person_names  (cost=0.42..448.85 rows=100 width=10)"
"  ->  Limit  (cost=0.42..447.85 rows=100 width=56)"
"        ->  Nested Loop Left Join  (cost=0.42..303946.91 rows=67931 width=56)"
"              ->  Seq Scan on ""Persons"" lp  (cost=0.00..1314.31 rows=67931 width=10)"
"              ->  Limit  (cost=0.42..4.44 rows=1 width=100)"
"                    ->  Index Only Scan Backward using ""IX_Names_Person"" on ""Names"" n  (cost=0.42..4.44 rows=1 width=100)"
"                          Index Cond: ("id" = (lp."id")::numeric)"

Why there is an "Index Only Scan" for the "Names" table? This table is not required to get a result. On SQL Server I get only a single scan over the "Persons" table. How can I tune Postgres to get a better query plans? I'm trying the lastest version, which is the Postgresql 15 beta 3.

Here is SQL Server version:

Create View person_names As
SELECT top 100 lp."Code", n."Name", n."Type"
from "Persons" lp
Outer Apply
(
    Select Top 1 *
    From "Names" n
    Where n.id = lp.id
    Order By "Date" desc
) n
GO

SET SHOWPLAN_TEXT ON;
GO

Select "Code" From person_names;

It gives correct execution plan:

  |--Top(TOP EXPRESSION:((100)))
       |--Index Scan(OBJECT:([Persons].[IX_Persons] AS [lp]))

CodePudding user response:

Change the lateral join to a regular left join, then Postgres is able to remove the select on the Names table:

create View person_names 
As
SELECT lp.Code, n.Name, n.Type
from Persons lp
  Left Join (
    Select distinct on (id) *
    From Names n
    Order By id, Date desc
  ) n on n.id = lp.id
limit 100;

The following index will support the distinct on () in case you do include columns from the Names table:

create index on "Names"(id, "Date" desc);

For select code from names this gives me this plan:

QUERY PLAN                                                                                                  
------------------------------------------------------------------------------------------------------------
Seq Scan on persons lp  (cost=0.00..309.00 rows=20000 width=7) (actual time=0.009..1.348 rows=20000 loops=1)
Planning Time: 0.262 ms                                                                                     
Execution Time: 1.738 ms                                                                                    

For select Code, name, type From person_names; this gives me this plan:

QUERY PLAN                                                                                                                                           
-----------------------------------------------------------------------------------------------------------------------------------------------------
Hash Right Join  (cost=559.42..14465.93 rows=20000 width=25) (actual time=5.585..68.545 rows=20000 loops=1)                                          
  Hash Cond: (n.id = lp.id)                                                                                                                          
  ->  Unique  (cost=0.42..13653.49 rows=20074 width=26) (actual time=0.053..57.323 rows=20000 loops=1)                                               
        ->  Index Scan using names_id_date_idx on names n  (cost=0.42..12903.49 rows=300000 width=26) (actual time=0.052..41.125 rows=300000 loops=1)
  ->  Hash  (cost=309.00..309.00 rows=20000 width=11) (actual time=5.407..5.407 rows=20000 loops=1)                                                  
        Buckets: 32768  Batches: 1  Memory Usage: 1116kB                                                                                             
        ->  Seq Scan on persons lp  (cost=0.00..309.00 rows=20000 width=11) (actual time=0.011..2.036 rows=20000 loops=1)                            
Planning Time: 0.460 ms                                                                                                                              
Execution Time: 69.180 ms                                                                                                                            

Of course I had to guess the table structures as you haven't provided any DDL.

Online example

CodePudding user response:

Change your view definition like that

create view person_names as
select p."Code",
       (select "Name"
        from "Names" n
        where n.id = p.id
        order by "Date" desc
        limit 1)
from "Persons" p
limit 100;
  • Related