I have a function that totally lost it's performance after i added another filter to it in postgresql
Here is a simple example of how it looked like at first with good performance.
CREATE OR REPLACE FUNCTION my_function(param_a boolean, param_b boolean )
RETURNS TABLE(blablabla)
LANGUAGE sql
IMMUTABLE
AS $function$
with data as (
select id,amount,account_nr from transfer
)
select * from
data d
where param_a or 0.00 <> (select sum(d2.amount)
from data d2
where d2.id = d.id)
$function$
;
(cost=0.25..10.25 rows=1000 width=560)
(actual time=1162.528..1162.561 rows=306 loops=1)
Buffers: shared hit=1099180
Planning time: 2.928 ms
Execution time: 1162.630 ms
After i added another filter with a subselect and count i lost my perfomance. Is this count so bad for performance and can i solve it another way?
CREATE OR REPLACE FUNCTION my_function(param_a boolean, param_b boolean )
RETURNS TABLE(blablabla)
LANGUAGE sql
IMMUTABLE
AS $function$
with data as (
select id,amount,account_nr from transfer
)
select * from
data d
where (param_b or 1 < (select count(d2.account_nr)
from data d2
where d2.id = d.id
group by d2.account_nr))
and (param_a or 0.00 <> (select sum(d2.amount)
from data d2
where d2.id = d.id))
$function$
;
(cost=0.25..10.25 rows=1000 width=560)
(actualtime=271191.341..271191.383 rows=306 loops=1)
Buffers: shared hit=1099180
Planning time: 2.955 ms
Execution time: 271191.463 ms
CodePudding user response:
Your slow query, embeddded in your stored function, is this:
with data as ( -- original query from the question.
select id,amount,account_nr from transfer
)
select *
from data d
where (param_b or 1 < (select count(d2.account_nr)
from data d2
where d2.id = d.id
group by d2.account_nr)
)
and (param_a or 0.00 <> (select sum(d2.amount)
from data d2
where d2.id = d.id)
)
This has a pointless common table expression. We can get rid of it for simplicity's sake. You can always put it back if you need it for some other purpose.
And has a couple of correlated subqueries. Let's refactor them into a single independent subquery. Starting with that independent subquery:
select id,
count(account_nr) account_count,
sum(amount) total_amount
from transfer
group by id
This aggregate subquery generates the number of accounts and the total amount for each id
in your transfer
table. Eyeball the results to convince yourself it does what you need it to do.
Then we can join this to the main query and apply your WHERE conditions.
select d.id, d.amount, d.account_nr
from transfer d
join (
select id,
count(account_nr) account_count,
sum(amount) total_amount
from transfer
group by id
) d2 ON d.id = d2.id
where (param_b or 1 < d2.account_count)
and (param_a or 0.00 <> d2.total_amount)
Using the independent subquery can speed things up a lot; sometimes the query planner decides it needs to re-evaluate the dependent subquery many times.
The following index will help the subquery run faster.
CREATE INDEX id_details ON transfer (id) INCLUDE (account_nr, amount);
Convince yourself this works and is fast enough. (I did not debug it, because I don't have your data.) You'll need to test it substituting true
and false
for param_a
and param_b
.
Then, and only then, put it into your stored function.