Home > Blockchain >  How does having clause filter rows in mysql?
How does having clause filter rows in mysql?

Time:08-21

As I know HAVING clause is used to filter rows for each group. I have a table that stores scores of students.

create table sc
(
    `classid` int,
    `studentid` int,
    `score` int
);

Here is the sample data:

 --------- ----------- ------- 
| classid | studentid | score |
 --------- ----------- ------- 
|       1 |         1 |    50 |
|       1 |         2 |    59 |
|       1 |         3 |    80 |
|       1 |         4 |    68 |
|       1 |         5 |    70 |
|       1 |         6 |    20 |
|       1 |         7 |    90 |
|       1 |         8 |   100 |
|       1 |         9 |    25 |
|       2 |         1 |    51 |
|       2 |         2 |    59 |
|       2 |         3 |    80 |
|       2 |         4 |    68 |
|       2 |         5 |    70 |
|       2 |         6 |    30 |
|       2 |         7 |    44 |
|       2 |         8 |    80 |
|       3 |         1 |    20 |
|       1 |        11 |    30 |
|       1 |        12 |    40 |
 --------- ----------- ------- 

And I want to query the max score of each class, so I wrote this SQL statement:

select * 
from sc 
group by classid 
having score = max(score);

But the output is not what I expect. The output only prints one row.

 --------- ----------- ------- 
| classid | studentid | score |
 --------- ----------- ------- 
|       3 |         1 |    20 |
 --------- ----------- ------- 

CodePudding user response:

  1. If you have columns in your SELECT clause that are not being aggregated by an aggregate formula like sum(), max(), avg(), etc, then those columns also need to be present in your GROUP BY. Older versions of mysql are the only RDBMS that doesn't error when you miss this step. Instead of erroring, it will just grab whichever values it wishes and give you random results every time you run.

  2. HAVING is one of the last steps to execute in SQL (just before ORDER BY). there's some nuance there with window functions and other stuff that executes late. Because of that doing score = max(score) doesn't make much sense. Either score is aggregated at this point or it is not. You can't compare both aggregated state and non-aggregated state of that column at the same time.

Instead you want a correlated subquery:

SELECT *
FROM t1 as dt
WHERE score = 
   (
      SELECT MAX(score)
      FROM t1 as dt2
      WHERE dt.classid = dt2.classid
   );

Alternatively you can use window functions:

SELECT *
FROM 
    (
         SELECT classid, 
            studentid, 
            score,
            MAX(score) OVER (PARTITION BY classid) as maxclassscore
         FROM t1
    ) dt
WHERE score = maxclassscore;

CodePudding user response:

Your code is not valid, but with your actual data you get only one result as only one has 100

SELECT * FROM sc WHERE score = 
(select max(score) maxscore FROM sc);
classid | studentid | score
------: | --------: | ----:
      1 |         8 |   100

db<>fiddle here

  • Related