I'm trying to migrate from SQL Server to Postgresql. Here is my Posgresql code:
Create View person_names As
SELECT lp."Code", n."Name", n."Type"
from "Persons" lp
Left Join LATERAL
(
Select *
From "Names" n
Where n.id = lp.id
Order By "Date" desc
Limit 1
) n on true
limit 100;
Explain
Select "Code" From person_names;
It prints
"Subquery Scan on person_names (cost=0.42..448.85 rows=100 width=10)"
" -> Limit (cost=0.42..447.85 rows=100 width=56)"
" -> Nested Loop Left Join (cost=0.42..303946.91 rows=67931 width=56)"
" -> Seq Scan on ""Persons"" lp (cost=0.00..1314.31 rows=67931 width=10)"
" -> Limit (cost=0.42..4.44 rows=1 width=100)"
" -> Index Only Scan Backward using ""IX_Names_Person"" on ""Names"" n (cost=0.42..4.44 rows=1 width=100)"
" Index Cond: ("id" = (lp."id")::numeric)"
Why there is an "Index Only Scan" for the "Names" table? This table is not required to get a result. On SQL Server I get only a single scan over the "Persons" table. How can I tune Postgres to get a better query plans? I'm trying the lastest version, which is the Postgresql 15 beta 3.
Here is SQL Server version:
Create View person_names As
SELECT top 100 lp."Code", n."Name", n."Type"
from "Persons" lp
Outer Apply
(
Select Top 1 *
From "Names" n
Where n.id = lp.id
Order By "Date" desc
) n
GO
SET SHOWPLAN_TEXT ON;
GO
Select "Code" From person_names;
It gives correct execution plan:
|--Top(TOP EXPRESSION:((100)))
|--Index Scan(OBJECT:([Persons].[IX_Persons] AS [lp]))
CodePudding user response:
Change the lateral join to a regular left join, then Postgres is able to remove the select on the Names
table:
create View person_names
As
SELECT lp.Code, n.Name, n.Type
from Persons lp
Left Join (
Select distinct on (id) *
From Names n
Order By id, Date desc
) n on n.id = lp.id
limit 100;
The following index will support the distinct on ()
in case you do include columns from the Names
table:
create index on "Names"(id, "Date" desc);
For select code from names
this gives me this plan:
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on persons lp (cost=0.00..309.00 rows=20000 width=7) (actual time=0.009..1.348 rows=20000 loops=1)
Planning Time: 0.262 ms
Execution Time: 1.738 ms
For select Code, name, type From person_names;
this gives me this plan:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=559.42..14465.93 rows=20000 width=25) (actual time=5.585..68.545 rows=20000 loops=1)
Hash Cond: (n.id = lp.id)
-> Unique (cost=0.42..13653.49 rows=20074 width=26) (actual time=0.053..57.323 rows=20000 loops=1)
-> Index Scan using names_id_date_idx on names n (cost=0.42..12903.49 rows=300000 width=26) (actual time=0.052..41.125 rows=300000 loops=1)
-> Hash (cost=309.00..309.00 rows=20000 width=11) (actual time=5.407..5.407 rows=20000 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1116kB
-> Seq Scan on persons lp (cost=0.00..309.00 rows=20000 width=11) (actual time=0.011..2.036 rows=20000 loops=1)
Planning Time: 0.460 ms
Execution Time: 69.180 ms
Of course I had to guess the table structures as you haven't provided any DDL.
CodePudding user response:
Change your view definition like that
create view person_names as
select p."Code",
(select "Name"
from "Names" n
where n.id = p.id
order by "Date" desc
limit 1)
from "Persons" p
limit 100;