Home > Software engineering >  Clustered Index and Sorting
Clustered Index and Sorting

Time:10-03

If you run the following query, for example, the rows could come back in any order:

select *
from [table_a];

A clustered index sorts the table based on whatever column(s) you choose.

That being said, if I have a clustered index on [column_a], [column_b], and [column_c] and run the same query from above, will the data ALWAYS come back sorted based on that order since that's the order that the clustered index was created on?

More clarification:

If I add an ORDER BY clause on something not in the index, the execution plan will have a sort operator.

If I have an ORDER BY clause on all the columns used in the clustered index, the execution plan will not have a sort operator.

That's why I asked this question in the first place.

CodePudding user response:

Note: question is cross-posted here.

if I have a clustered index on [column_a], [column_b], and [column_c] and run the same query from above, will the data ALWAYS come back sorted based on that order since that's the order that the clustered index was created on?

No.

SQL Server does not guarantee that it will return data in any order unless you specify the order. It is easy to prove things can go wrong by simply creating a covering, non-clustered index that leads on a different column:

But things can go sideways in other ways, too, e.g. when parallelism or partitioning come into play and SQL Server re-assembles the data from different threads, or when the query gets more complex using joins or filters and a different plan other than a clustered index scan makes sense. Leaving off the order by clause is telling SQL Server: "I don't care about order."

Also, just as a point of clarification:

If I have an ORDER BY clause on all the columns used in the clustered index, the execution plan will not have a sort operator.

...this is true only if the columns are listed in the exact same order as the key definition. ORDER BY c, b, a is "all the columns" but it obviously will produce different output (and require some type of sort operation to get there).

If you expect and want to be comfortable relying on a certain order, always use an ORDER BY clause.

Further reading:

CodePudding user response:

The rule of thumb for SQL reporting is that if you expect the result set to be in a certain order, then you always need to add the appropriate ORDER BY clause. Now, it is possible that SQL Server might choose to scan the index in the order of column_a, column_b and column_c, and therefore return the results in that order, but there is no guarantee that this would happen. But, if you used the following query:

SELECT *
FROM [table_a]
ORDER BY [column_a], [column_b], [column_c];

then an index on the three columns, in the same order as the above ORDER BY clause should be used and would be said to cover the entire query.

  • Related