Home > database >  Does using HAVING in SQL here compute an aggregate function a second time?
Does using HAVING in SQL here compute an aggregate function a second time?

Time:12-03

I saw this query as an answer to another question on this site:

SELECT MAX(date), thread_id
FROM table
GROUP BY thread_id 
HAVING MAX(date) < 1555

With this database sample:

 ----------------------------- 
|  id |   date  |  thread_id  |
 ----- --------- ------------- 
|  1  |   1111  |      4      |
|  2  |   1333  |      4      |
|  3  |   1444  |      5      |
|  4  |   1666  |      5      |
 ----------------------------- 

Am I correct in assuming MAX(date) is computed twice here?

If so, this would definitely reduce the efficiency of this query. Is it possible to refactor the query so that MAX(date) is only computed once, so that performance can be maximised?

CodePudding user response:

Absolutly NOT !

The letters SQL means Structured Query Language. The most important word into this name is QUERY that means it is not a procedural language. In a procedural language, you write the exact commands that you want the computer to do. In SQL, a "query" language, you do not write a program code, but only the desired answer, then the SQL algrebrizer/optimizer have to compute the program that will be executed by the query processor (known as "query execution plan").

SQL is translated into relational algebra which is a simple mathematic formula and then be simplified by the algrebrizer like the work you've done at school when the teacher gives you a complex equation to solve : factorization, substitution...

The SQL engine will do the same, by factorizing the MAX(date) that will be compute once only !

CodePudding user response:

A peek into the query pipeline/execution plan will answer your question. During the GROUP BY aggregation step, MySQL will compute the max date for each thread_id. Then, during the HAVING filter, the max date will already be available to use. So, I would expect MAX(date) to be computed only once.

Note that MySQL actually permits using aliases in the HAVING clause, so you could have written your query as:

SELECT thread_id, MAX(date) AS max_date
FROM yourTable
GROUP BY thread_id 
HAVING max_date < 1555;
  • Related