Home > Software engineering >  PostgreSQL: Obtaining multiple values from the record type
PostgreSQL: Obtaining multiple values from the record type

Time:12-22

I am writing an sql program that creates a table with columns computed in a function. The function returns the record type. This is what the header looks like:

create or replace function get_items(col1 int, col2 int) returns record

What I would like to be able to do is:

create table table_items as (
with q as (
    select *,
   (SELECT * FROM get_items(t.col1, t.col2) AS (item1 integer, item2 integer))
    from 
    table_t as t
    )
    select * from q
);

however, that results in:

ERROR:  subquery must return only one column

To fix the error I changed the code into:

create table table_items as (
with q as (
    select *,
   (SELECT item1 FROM get_items(t.col1, t.col2) AS (item1 integer, item2 integer)),
   (SELECT item2 FROM get_items(t.col1, t.col2) AS (item1 integer, item2 integer)) 
    from 
    table_t as t
    )
    select * from q
);

This solution works, though twice as slow when item2 is obtained in addition to item1. I assume that it is because the same query is executed twice. Is there a way to obtain both items having the function invoked only once? Thanks a lot!

CodePudding user response:

This is a good use case for a lateral join.

create table table_items as 
select *
from table_t as t
cross join lateral
get_items(t.col1, t.col2) as l(item1 integer, item2 integer);

CodePudding user response:

My guess is that the reason is that returns record is polymorhic. Try with RETURNS TABLE (col1 int, col2 int) or return custom type

CodePudding user response:

A direct implementation of what you planned initially should be faster than the lateral join. As suggested by @esmin: if you're only getting (item1 integer, item2 integer) type of record from the function, it's better to define it up front (here's an online demo for everything below):

drop function if exists get_items;

create type get_items_return_rec as (a int, b int);

create or replace function get_items(col1 int,col2 int) 
  returns get_items_return_rec language plpgsql as $$
begin
    return (col1,col2)::get_items_return_rec;
end $$;

Which will allow you to refer to its fields directly, exactly as you wanted to - calling it just once, splitting into fields later:

explain analyze 
select a,
       b,
       (rec).a,
       (rec).b 
from (
  select a,
         b,
         get_items(a,b) as rec
  from test) subquery;
--                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------
-- Seq Scan on test  (cost=0.00..360098.00 rows=700000 width=16) (actual time=2.244..816.545 rows=700000 loops=1)
-- Planning Time: 0.061 ms
-- JIT:
--   Functions: 2
--   Options: Inlining false, Optimization false, Expressions true, Deforming true
--   Timing: Generation 0.451 ms, Inlining 0.000 ms, Optimization 0.193 ms, Emission 1.807 ms, Total 2.451 ms
-- Execution Time: 841.719 ms

Compare that to the lateral join and your initial solution on the same, 700k row sample:

explain analyze
  select a,
         b,
         (select c from get_items(a,b) as rec(c int,d int)),
         (select d from get_items(a,b) as rec(c int,d int))
  from test;
--                                                        QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
-- Seq Scan on test  (cost=0.00..374098.00 rows=700000 width=16) (actual time=6.845..2275.461 rows=700000 loops=1)
--   SubPlan 1
--     ->  Function Scan on get_items rec  (cost=0.25..0.26 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=700000)
--   SubPlan 2
--     ->  Function Scan on get_items rec_1  (cost=0.25..0.26 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=700000)
-- Planning Time: 0.073 ms
-- JIT:
--   Functions: 18
--   Options: Inlining false, Optimization false, Expressions true, Deforming true
--   Timing: Generation 1.311 ms, Inlining 0.000 ms, Optimization 0.319 ms, Emission 6.256 ms, Total 7.886 ms
-- Execution Time: 2300.981 ms
explain analyze
  select a,
         b
  from  test
  cross join lateral
  get_items(a,b) as rec(c int,d int);

--                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
-- Nested Loop  (cost=0.25..24098.25 rows=700000 width=8) (actual time=0.064..1116.634 rows=700000 loops=1)
--   ->  Seq Scan on test  (cost=0.00..10098.00 rows=700000 width=8) (actual time=0.014..44.813 rows=700000 loops=1)
--   ->  Function Scan on get_items rec  (cost=0.25..0.26 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=700000)
-- Planning Time: 0.080 ms
-- Execution Time: 1140.850 ms

If it's speed you're after, see if you can rewrite your function as a plain SQL function instead of using plpgsql. The example is trivial, but it still illustrates the comparative difference you can expect:

drop function if exists get_items;
create or replace function get_items(col1 int,col2 int) 
  returns get_items_return_rec language sql stable as $$
select (col1,col2)::get_items_return_rec
$$;

explain analyze
select a,
       b,
       (rec).a,
       (rec).b
from (
  select a,
         b,
         get_items(a,b) as rec
  from test) subq;
--                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------
-- Seq Scan on test  (cost=0.00..10098.00 rows=700000 width=16) (actual time=0.010..64.552 rows=700000 loops=1)
-- Planning Time: 0.085 ms
-- Execution Time: 88.892 ms

Also, check out function volatility levels and if your function isn't causing side-effects, make it at least stable.

  • Related