Home > Back-end >  Performance impact of multiple aggregate functions in HAVING clause
Performance impact of multiple aggregate functions in HAVING clause

Time:09-22

I am wondering if there is a difference between the two queries below.

I am looking for a general answer to explain how the optimizer treats each of these answers. There is an index on t.id.

The version of Oracle is 11g.

select t.id, sum(t.amount)
from transaction t
group by t.id
having sum(t.amount) between -0.009 and 0.009
select t.id, sum(t.amount)
from transaction t
group by t.id
having sum(t.amount) >= -0.009 and sum(t.amount)<= 0.009

CodePudding user response:

In an aggregation query, most of the work involves moving the data around. There is some overhead for aggregations, but it is usually pretty simple.

And, the SQL compiler can decide if it wants to re-use aggregated expressions. Just because you use sum(amount) twice in the query doesn't mean that it gets executed twice.

Some aggregation functions are more expensive -- especially on strings or using distinct. You can always test queries to see if there is much impact, but in general, you should worry about whether your logic is correct not how many times you are using aggregation functions.

CodePudding user response:

If you want to obseve basic information about the steps decided by the CBO for the execution of SQL statement use explain plan

Example

EXPLAIN PLAN  SET STATEMENT_ID = 'jara1' into   plan_table  FOR
select DEPARTMENT_ID, sum(salary) 
from HR.employees
group by DEPARTMENT_ID
having sum(salary) between 5000 and 10000
;
--    
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));

The query returns

Plan hash value: 244580604
 
---------------------------------------------------------------------------------
| Id  | Operation           | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           |     1 |     7 |     4  (25)| 00:00:01 |
|*  1 |  FILTER             |           |       |       |            |          |
|   2 |   HASH GROUP BY     |           |     1 |     7 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| EMPLOYEES |   107 |   749 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
 
   1 - SEL$1
   3 - SEL$1 / EMPLOYEES@SEL$1
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(SUM("SALARY")>=5000 AND SUM("SALARY")<=10000)
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
   1 - (rowset=256) "DEPARTMENT_ID"[NUMBER,22], SUM("SALARY")[22]
   2 - (#keys=1; rowset=256) "DEPARTMENT_ID"[NUMBER,22], 
       SUM("SALARY")[22]
   3 - (rowset=256) "SALARY"[NUMBER,22], "DEPARTMENT_ID"[NUMBER,22]

So first of all you see a TABLE ACCESS FULL is performed (line 3), so your index assumption is not correct.

As pointed in other answer, you see the between is translated in two perdicates connected with and (filter line 1).

But most impertant fro yur question is the Column Projection, you see that the sum(SALARY) is calculated in line 2 (HASH GROUP BY operation) and passed to the line 1 (FILTER), in both cases only once (one column with length 22).

So don't worry about multiple calculation.

CodePudding user response:

There is absolutely no difference between the two queries. between is just syntactical sugar; the parser immediately transforms the between condition into the two inequalities, combined with the and operator. This is done even before the optimizer sees the query. (Note that in this context the distinction between the parsing and the optimization stages is meaningful, even though often programmers think of them as a single step.)

Trivial example:

SQL> set autotrace traceonly explain
SQL> select deptno, sum(sal) as sum_sal
  2  from   scott.emp
  3  group  by deptno
  4  having sum(sal) between 10000 and 20000
  5  ;

Execution Plan
----------------------------------------------------------
Plan hash value: 2138686577

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |     7 |     4  (25)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |
|   2 |   HASH GROUP BY     |      |     1 |     7 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| EMP  |    14 |    98 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(SUM("SAL")>=10000 AND SUM("SAL")<=20000)

The "index on..." thing that you mention has nothing to do with the question.

CodePudding user response:

Another fun way to test this:

with function expand_sql_text(text_in varchar2) 
         return varchar2
     as
         text_out long;
     begin
         dbms_utility.expand_sql_text(text_in, text_out);
         return text_out;
     end expand_sql_text;
select expand_sql_text(
         'select * from dual where 2 between 1 and 3'
       ) as text_out
from   dual
/
TEXT_OUT
------------------------------------------------------------------------------------------------------------------------------------------------------------
SELECT "A1"."DUMMY" "DUMMY" FROM "SYS"."DUAL" "A1" WHERE 2>=1 AND 2<=3

1 row selected.

In your original question, the second predicate was

having sum(t.amount) > -0.009 and sum(t.amount)< 0.009

which is not the same as the between version, because between is not exclusive.

In SQL generally, filter predicates against simple literals do not normally lead to any significant performance overhead. In a group by clause, the fact that the predicate is applied after aggregation reduces any overhead even further.

  • Related