Home > Software design >  Optimizing merge join
Optimizing merge join

Time:02-22

Read this article https://bertwagner.com/posts/visualizing-merge-join-internals-and-understanding-their-implications/

There are such phrase:

"If the optimizer added a sort to the upstream merge join though, it may be worth investigating whether it's possible to presort that data so SQL Server doesn't need to sort it on its own. Often times this can be as simple as redefining an included index column to a key column - if you are adding it as the last key column in the index then regression impact is usually minor but you may be able to allow SQL Server to use the merge join without any additional sorting required."

I dont understand. Author offers to add extra column(the one, sorted by sql server itself) to already existing index as a last one? As i understand index is sorted from 1 column to last.

E.G. table with columns "number"(int), "letter"(varchar) will have index ("number", "letter") like

  • 1 A
  • 1 D
  • 3 A
  • 3 D

So how does presence of "letter" column in index can save server the trouble of sorting it?

CodePudding user response:

A merge join can only merge to data streams that are already sorted by according to the join predicate (forward or backward).

If the join predicate includes both columns (number & letter), but there's an index on number only, the engine won't be able to use the index as a source of a "presorted" data stream. If the engine decides for a merge in this case, you'll notice there plan will include an extra operator "sort" downstream of the merge operator. This may not be efficient if the sort is expensive.

The author is indicating that if you see a case like this one, then you could explore the possibility of changing the existing index by adding the column letter to it. In this new scenario the engine will be able to directly use this index as a presorted data stream, without the need of an extra "sort" operator downstream.

However, changing an index can be tricky. Maybe improving the performance of this query can deteriorate another more important one. Make sure you understand the implications.

  • Related