Home > other >  Postgres query filtered several different ways
Postgres query filtered several different ways

Time:01-25

I have a big postgres query with lots of joins that I want to filter several different ways. I have one central function that performs the joins:

create function my_big_function(p_limit int, p_offset, p_filter_ids int[])
returns setof my_type
language sql
immutable
returns null on null input 
as
$$
select my_column_list
from 
(
 select my_filter_id
    from  unnest(p_filter_ids)
    order by my_filter_id
    limit p_limit, offset p_offset
    ) f(my_filter_id)
    inner join... several other tables (using indexed columns)

Then I have a series of short functions that I use to build the list of filter IDs such as:

create or replace my_filter_id_function(p_some_id int)
returns int[]
language sql
immutable
returns null on null input
as
$$
select array_agg(my_filter_id) from my_table where some_id = p_some_id
$$;

This allows me to quickly add several filter functions and feed the resultant arrays into the big query as an argument without having to duplicate the big query in lots of places.

select * from my_big_function(1000, 0, my_filter_function1(p_some_id));
select * from my_big_function(1000, 0, my_filter_function2(p_some_other_id));
select * from my_big_function(1000, 0, my_filter_function3(p_yet_another_id));

The problem is that my queries are slow when the array of values returned from the filter functions get big (~1,000 rows). I presume this is because the big query has to order by and then join using the non-indexed result? Is there a better way to have a single generic query that I can feed IDs into to filter it different ways?

CodePudding user response:

I would avoid large arrays, because packing and unpacking them becomes expensive.

But I would say that the main problem here is that you split the query into different functions, which prevents the optimizer from treating the whole query at once and coming out with an efficient execution plan.

If you want to avoid repeating parts of a query over and over, the correct tool is not a function, but a view. The view gets replaced with its definition when the query is executed, so the optimizer can find a good plan for the whole query.

Don't fall into the trap of defining a “world view” that joins all your tables. The view should only contain the tables that you actually need in the query.

  •  Tags:  
  • Related