I have been refactoring some old queries recently and noticed that a lot of them repeat ISNULL()
in the GROUP BY
clause, where it is used in the SELECT
clause. I feel in my bones that removing the ISNULL()
in the GROUP BY
clause will improve performance, but I can't find any documentation on whether it is actually likely to or not. Here's the sort of thing I mean:
SELECT
ISNULL(Foo,-1) AS Foo
,ISNULL(Bar,-1) AS Bar
,SUM(This) AS This
,SUM(That) AS That
FROM
dbo.ThisThatTable AS ThisThat
LEFT JOIN dbo.FooBarTable AS FooBar ON ThisThat.FooBarId = FooBar.Id
GROUP BY
ISNULL(Foo,-1)
,ISNULL(Bar,-1);
GO
The above is the way I keep coming across - When there is grouping on the Foo column, the SELECT
and the GROUP BY
for selected columns match exactly. The example below is a possible alternative - some possibly unnecessary ISNULL()
calls have been removed, and the SELECT
and GROUP BY
clauses no longer match.
SELECT
ISNULL(Foo,-1) AS Foo
,ISNULL(Bar,-1) AS Bar
,SUM(This) AS This
,SUM(That) AS That
FROM
dbo.ThisThatTable AS ThisThat
LEFT JOIN dbo.FooBarTable AS FooBar ON ThisThat.FooBarId = FooBar.Id
GROUP BY
Foo
,Bar;
GO
I suppose maybe when the SELECT and GROUP BY
clauses match, the optimiser only has to do the ISNULL()
calculation once to know what is going on, so it might be theoretically more performative to group by the results that are actually selected? Alternatively, maybe it is better to avoid adding a second set of ISNULL()
calls that don't change the granularity of the data at all... Maybe the optimiser is clever enough to realise that the NULLS in the grouping are (in this case) -1s in the selection...?
I personally would prefer removing any unnecessary functions, especially once that might affect index usage but when I look online, the references to performance are all like the answers here, about using ISNULL()
in the WHERE
clause, which I already know to avoid.
I also suspect that any gains are going to be vanishingly small, so this is really asking for an academic or theoretical answer, but as I work, I keep wondering and it bugs me, so I thought I would ask if anyone has any thoughts.
CodePudding user response:
Non-aggregated columns in SELECT clauses generally must precisely match the ones in GROUP BY clauses. If I were you, and I were dealing with tested production code, I would not make the change you propose.
Edit the match between non-aggregated SELECT columns and GROUP BY columns is necessary for GROUP BY. If the columns in SELECT are 1:1 dependent on the columns in GROUP BY, it will work. Otherwise the results are ambiguous.
CodePudding user response:
Internally, SQL does not really have two copies of each ISNULL. They are all flattened together in the internal tree used during compilation. So, this level of optimization is not useful to consider in SQL Server. A query without any ISNULL in it would probably perform a bit faster and potentially a lot faster depending on the rest of the schema and query. However, the ISNULL in the select list and the GROUP BY list are not executed twice in most cases within SQL - this level of detail can show up in showplan, but it's often below the level of detail most people would care to examine.
CodePudding user response:
There are a few different aspects to consider here:
Referring to the same value multiple times in the same scope
In most situations, the optimizer is clever enough to collapse these into calculating them once. The fact that you have aGROUP BY
over them makes this even more likely.Is it faster to group when the value is guaranteed to not be null?
Possibly, although I doubt the difference is measurable.The
SELECT
does not have to match exactly, it only needs to be functionally dependent onGROUP BY
columns and aggregation functions. It may not be functionally dependent on any other columns.The most important thing top consider: indexing.
This is much, much more important than the other considerations. When grouping, if you can hit an index then it will go much faster, because it can remove sorting and just useStream Aggregate
. This is not possible if you useISNULL
in theGROUP BY
(barring computed columns or indexed views).
Note that your results will not be the same: the first example collapses the NULL
group into the -1
group. The second example does not, so you may want to remove the ISNULL
from the SELECT
also, in order to differentiate them. Alternatively, put a WHERE ... IS NOT NULL
instead.