I am writing an sql program that creates a table with columns computed in a function. The function returns the record type. This is what the header looks like:
create or replace function get_items(col1 int, col2 int) returns record
What I would like to be able to do is:
create table table_items as (
with q as (
select *,
(SELECT * FROM get_items(t.col1, t.col2) AS (item1 integer, item2 integer))
from
table_t as t
)
select * from q
);
however, that results in:
ERROR: subquery must return only one column
To fix the error I changed the code into:
create table table_items as (
with q as (
select *,
(SELECT item1 FROM get_items(t.col1, t.col2) AS (item1 integer, item2 integer)),
(SELECT item2 FROM get_items(t.col1, t.col2) AS (item1 integer, item2 integer))
from
table_t as t
)
select * from q
);
This solution works, though twice as slow when item2 is obtained in addition to item1. I assume that it is because the same query is executed twice. Is there a way to obtain both items having the function invoked only once? Thanks a lot!
CodePudding user response:
This is a good use case for a lateral join.
create table table_items as
select *
from table_t as t
cross join lateral
get_items(t.col1, t.col2) as l(item1 integer, item2 integer);
CodePudding user response:
My guess is that the reason is that returns record
is polymorhic. Try with RETURNS TABLE (col1 int, col2 int)
or return custom type
CodePudding user response:
A direct implementation of what you planned initially should be faster than the lateral join. As suggested by @esmin: if you're only getting (item1 integer, item2 integer)
type of record from the function, it's better to define it up front (here's an online demo for everything below):
drop function if exists get_items;
create type get_items_return_rec as (a int, b int);
create or replace function get_items(col1 int,col2 int)
returns get_items_return_rec language plpgsql as $$
begin
return (col1,col2)::get_items_return_rec;
end $$;
Which will allow you to refer to its fields directly, exactly as you wanted to - calling it just once, splitting into fields later:
explain analyze
select a,
b,
(rec).a,
(rec).b
from (
select a,
b,
get_items(a,b) as rec
from test) subquery;
-- QUERY PLAN
------------------------------------------------------------------------------------------------------------------
-- Seq Scan on test (cost=0.00..360098.00 rows=700000 width=16) (actual time=2.244..816.545 rows=700000 loops=1)
-- Planning Time: 0.061 ms
-- JIT:
-- Functions: 2
-- Options: Inlining false, Optimization false, Expressions true, Deforming true
-- Timing: Generation 0.451 ms, Inlining 0.000 ms, Optimization 0.193 ms, Emission 1.807 ms, Total 2.451 ms
-- Execution Time: 841.719 ms
Compare that to the lateral join and your initial solution on the same, 700k row sample:
explain analyze
select a,
b,
(select c from get_items(a,b) as rec(c int,d int)),
(select d from get_items(a,b) as rec(c int,d int))
from test;
-- QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
-- Seq Scan on test (cost=0.00..374098.00 rows=700000 width=16) (actual time=6.845..2275.461 rows=700000 loops=1)
-- SubPlan 1
-- -> Function Scan on get_items rec (cost=0.25..0.26 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=700000)
-- SubPlan 2
-- -> Function Scan on get_items rec_1 (cost=0.25..0.26 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=700000)
-- Planning Time: 0.073 ms
-- JIT:
-- Functions: 18
-- Options: Inlining false, Optimization false, Expressions true, Deforming true
-- Timing: Generation 1.311 ms, Inlining 0.000 ms, Optimization 0.319 ms, Emission 6.256 ms, Total 7.886 ms
-- Execution Time: 2300.981 ms
explain analyze
select a,
b
from test
cross join lateral
get_items(a,b) as rec(c int,d int);
-- QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
-- Nested Loop (cost=0.25..24098.25 rows=700000 width=8) (actual time=0.064..1116.634 rows=700000 loops=1)
-- -> Seq Scan on test (cost=0.00..10098.00 rows=700000 width=8) (actual time=0.014..44.813 rows=700000 loops=1)
-- -> Function Scan on get_items rec (cost=0.25..0.26 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=700000)
-- Planning Time: 0.080 ms
-- Execution Time: 1140.850 ms
If it's speed you're after, see if you can rewrite your function as a plain SQL
function instead of using plpgsql
. The example is trivial, but it still illustrates the comparative difference you can expect:
drop function if exists get_items;
create or replace function get_items(col1 int,col2 int)
returns get_items_return_rec language sql stable as $$
select (col1,col2)::get_items_return_rec
$$;
explain analyze
select a,
b,
(rec).a,
(rec).b
from (
select a,
b,
get_items(a,b) as rec
from test) subq;
-- QUERY PLAN
----------------------------------------------------------------------------------------------------------------
-- Seq Scan on test (cost=0.00..10098.00 rows=700000 width=16) (actual time=0.010..64.552 rows=700000 loops=1)
-- Planning Time: 0.085 ms
-- Execution Time: 88.892 ms
Also, check out function volatility levels and if your function isn't causing side-effects, make it at least stable
.