Aaron states that the Community Influencer of the Year award goes to, “someone who has made a dramatic impact on the SQL Server community.” This type of recognition is wonderful to hear and I am honored to have such kind words come from someone like Aaron.
I’m especially flattered since the previous award recipients are Andy Mallon (2016) and Drew Furgiuele (2017). Being recognized in the same league as those two feels amazing. Andy and Drew are incredible and inspiring, both in the data platform community and outside of it.
2018 was a great year and I’m looking forward to 2019. The number of people I’ve befriended this year and have helped me along the way is staggering; thank you to each and every one of you.
And finally, thank you all for following me along on this journey. I truly appreciate the interactions I have with you in-person, on Twitter, in the comments, and everywhere else.
Merge joins are theoretically the fastest* physical join operators available, however they require that data from both inputs is sorted:
The base algorithm works as follows: SQL Server compares the first rows from both sorted inputs. It then continues comparing the next rows from the second input as long as the values match the first input’s value.
Once the values no longer match, SQL Server increments the row of whichever input has the smaller value – it then continues performing comparisons and outputting any joined records. (For more detailed information, be sure to check out Craig Freedman’s post on merge joins.)
This is efficient because in most instances SQL Server never has to go back and read any rows multiple times. The exception here happens when duplicate values exist in both input tables (or rather, SQL Server doesn’t have meta data available proving that duplicates don’t exist in both tables) and SQL Server has to perform a many-to-many merge join:
Note: The image above and the explanation below are “good enough” for understanding this process for practical purposes – if you want to dive into the peek-ahead buffers, optimizations, and other inner workings of this process, I highly recommend reading through Hugo Kornelis’s reference on merge joins.
A many-to-many join forces SQL Server to write any duplicated values in the second table into a worktable in tempdb and do the comparisons there. If those duplicated values are also duplicated in the first table, SQL Server then compares the first table’s values to those already stored in the worktable.
What Do Merge Joins Reveal?
Knowing the internals of how a merge join works allows us to infer what the optimizer thinks about our data and the join’s upstream operators, helping us focus our performance tuning efforts.
Here are a few scenarios to consider the next time you see a merge join being used in your execution plan:
The optimizer chooses to use a merge join when the input data is already sorted or SQL Server can sort the data for a low enough cost. Additionally, the optimizer is fairly pessimistic at calculating the costs of merge joins (great explanation by Joe Obbish), so if a merge join makes its way into your plans, it probably means that it is fairly efficient.
While a merge join may be efficient, it’s always worth looking at why the data coming in to the merge join operator is already sorted:
If it’s sorted because the merge join is pulling data directly from an index sorted on your join keys, then there is not much to be concerned about.
If the optimizer added a sort to the upstream merge join though, it may be worth investigating whether it’s possible to presort that data so SQL Server doesn’t need to sort it on its own. Often times this can be as simple as redefining an included index column to a key column – if you are adding it as the last key column in the index then regression impact is usually minor but you may be able to allow SQL Server to use the merge join without any additional sorting required.
If your inputs contain many duplicates, it may be worth checking if a merge join is really the most efficient operator for the join. As outlined above, many-to-many merge joins require tempdb usage which could become a bottle neck!
So while merge joins are typically not the high-cost problem spots in your execution plans, it’s always worth investigating upstream operators to see if some additional improvements can be made.
*NOTE: There are always exceptions to the rule. Merge joins have the fastest algorithm since each row only needs to be read once from the source inputs. Also, optimizations occurring in other join operators can give those operators better performance under certain conditions.
For example, a single row outer table with an indexed inner table using a nested loops join will outperform the same setup with a merge join because of the inner loops joins’ optimizations:
DROP TABLE IF EXISTS T1;
CREATE TABLE T1 (Id int identity PRIMARY KEY, Col1 CHAR(1000));
INSERT INTO T1 VALUES('');
DROP TABLE IF EXISTS T2;
CREATE TABLE T2 (Id int identity PRIMARY KEY, Col1 CHAR(1000));
INSERT INTO T2 VALUES('');
-- Turn on execution plans and check actual rows for T2
FROM T1 INNER LOOP JOIN T2 ON T1.Id = T2.Id;
FROM T1 INNER MERGE JOIN T2 ON T1.Id = T2.Id;
There might also be instances where inputs with many duplicate records that require worktables may be slower than a nested loop join.
As I mentioned though, I typically find these types of scenarios to be the exceptions when encountering merge joins in the real-world.
Everyone has their own method of reading an execution plan when performance tuning a slow SQL query. One of the first things I like to look at are what kind of join operators are being used:
These three little icons may not seem like the most obvious place to begin troubleshooting a slow query, but with larger plans especially I like starting with a quick glance at the join operators because they allow you to infer a lot about what SQL Server thinks about your data.
This will be a three part series where we’ll learn how each join algorithm works and what they can reveal about our upstream execution plan operators.
Once every inner value has been checked, SQL Server moves to the next value in the outer table and the process repeats until every value from our outer table has been compared to every value in our inner table.
This description is a worst case example of the performance of a nested loop join. Several optimizations exist that can make the join more efficient. For example, if the inner table join values are sorted (because of an index you created or a spool that SQL Server created), SQL Server can process the rows much faster:
In the above animation, the inner input is a index sorted on the join key, allowing SQL Server to seek directly to the rows it needs, reducing the total number of comparisons that need to be made
Knowing the internals of how a nested loops join works allows us to infer what the optimizer thinks about our data and the join’s upstream operators, helping us focus our performance tuning efforts.
Here are a few scenarios to consider the next time you see a nested loops join being used in your execution plan:
Nested loops joins are CPU intensive; at worst, every row needs to be compared to every other row and this can take some time. This means when you see a nested loops join, SQL Server probably thinks that one of the two inputs is relatively small.
… and if one of the inputs is relatively small, great! If instead you see upstream operators that are moving large amounts of data, you may have a estimation problem going on in this area of the plan and may need to update stats/add indexes/refactor the query to have SQL Server provide better estimates (and maybe a more appropriate join).
Nested loops sometimes accompany RID or key lookups. I always check for one of these because they often leave room for some performance improvements:
If a RID lookup exists, it’s usually easy enough to add a clustered index to that underlying table to squeeze out some extra performance.
If either RID or key lookup exist, I always check what columns are being returned to see if a smaller index could be used instead (by including a column in a key/column of an existing index) or if the query can be refactored to not bring back those columns (eg. get rid of the SELECT *).
Nested loops joins do not require data to be sorted on input. However, performance can improve with an indexed inner data source (see animation above), and SQL Server might choose a more efficient operator if the inputs are both sorted.
At the very least, nested loops joins make me think to check whether the input data isn’t sorted because of some upstream transformations, or because of missing indexes.
So while nested loops in your plans will always require more investigation, looking at them and the operators around them can provide some good insight into what SQL Server thinks about your data.