Guest Posts

Join Elimination: When SQL Server Removes Unnecessary Tables

SentryOne Newsletters

The SQLPerformance.com bi-weekly newsletter keeps you up to speed on the most recent blog posts and forum discussions in the SQL Server community.

eNews is a bi-monthly newsletter with fun information about SentryOne, tips to help improve your productivity, and much more.

Subscribe

Featured Author

Paul White is an independent SQL Server consultant specializing in performance tuning, execution plans, and the query optimizer.

Paul’s Posts

Full video

Guest Author : Bert Wagner (@bertwagner)

Join elimination is one of the many techniques that the SQL Server query optimizer uses to create efficient query plans. Specifically, join elimination occurs when SQL Server can establish equality by using query logic or trusted database constraints to eliminate unnecessary joins. See a full video version of this post on my YouTube channel.

Join Elimination In Action

The simplest way to explain join elimination is through a series of demos. For these examples I’ll be using the WideWorldImporters demo database.

To start things off, we’ll look at how join elimination works when a foreign key is present:

SELECT
  	il.*
  FROM
  	Sales.InvoiceLines il
  	INNER JOIN Sales.Invoices i
  		ON il.InvoiceID = i.InvoiceID;

In this example, we are returning data only from Sales.InvoiceLines where a matching InvoiceID is found in Sales.Invoices. While you might expect the execution plan to show a join operator on the Sales.InvoiceLines and Sales.Invoices tables, SQL Server never bothers looking at Sales.Invoices at all:

SQL Server avoids joining to the Sales.Invoices table because it trusts the referential integrity maintained by the foreign key constraint defined on InvoiceID between Sales.InvoiceLines and Sales.Invoices; if a row exists in Sales.InvoiceLines, a row with the matching value for InvoiceID must exist in Sales.Invoices. And since we are only returning data from the Sales.InvoiceLines table, SQL Server doesn’t need to read any pages from Sales.Invoices at all.

We can verify that SQL Server is using the foreign key constraint to eliminate the join by dropping the constraint and running our query again:

ALTER TABLE [Sales].[InvoiceLines]  
DROP CONSTRAINT [FK_Sales_InvoiceLines_InvoiceID_Sales_Invoices];

With no information about the relationship between our two tables, SQL Server is forced to perform a join, scanning an index on our Sales.Invoices table to find matching InvoiceIDs.

From an I/O standpoint, SQL Server must read an extra 124 pages from an index on the Sales.Invoices table, and that’s only because it is able to use a narrow (single column) index created by a different foreign key constraint. This scenario could play out much worse on larger tables or tables that are not indexed appropriately.

Limitations

While the previous example shows the basics of how join elimination works, we need to be aware of a few caveats.

First, let’s add back our foreign key constraint:

ALTER TABLE [Sales].[InvoiceLines]  
  WITH NOCHECK ADD  CONSTRAINT 
  [FK_Sales_InvoiceLines_InvoiceID_Sales_Invoices] FOREIGN KEY([InvoiceID])
REFERENCES [Sales].[Invoices] ([InvoiceID]);

If we run our sample query again, we’ll notice that we don’t get a plan that exhibits join elimination; instead we get a plan that scans both of our joined tables.

The reason this occurs is because, when we re-added our foreign key constraint, SQL Server doesn’t know if any data has been modified in the meantime. Any new or changed data may not adhere to this constraint, so SQL Server can’t trust the validity of our data:

SELECT
	f.name AS foreign_key_name
	,OBJECT_NAME(f.parent_object_id) AS table_name
	,COL_NAME(fc.parent_object_id, fc.parent_column_id) AS constraint_column_name
	,OBJECT_NAME (f.referenced_object_id) AS referenced_object
	,COL_NAME(fc.referenced_object_id, fc.referenced_column_id) AS referenced_column_name
	,f.is_not_trusted
FROM 
	sys.foreign_keys AS f
	INNER JOIN sys.foreign_key_columns AS fc
		ON f.object_id = fc.constraint_object_id
WHERE 
	f.parent_object_id = OBJECT_ID('Sales.InvoiceLines');

To re-establish SQL Server’s trust of this constraint, we must check its validity:

ALTER TABLE [Sales].[InvoiceLines] 
WITH CHECK CHECK CONSTRAINT [FK_Sales_InvoiceLines_InvoiceID_Sales_Invoices];

On large tables, this operation may take some time, not to mention the overhead of SQL Server validating this data during every insert/update/delete modification going forward.

Another limitation is that SQL Server cannot eliminate joined tables when the query needs to return any data from those potential elimination candidates:

SELECT
	il.*,
	i.InvoiceDate
FROM
	Sales.InvoiceLines il
	INNER JOIN Sales.Invoices i
		ON il.InvoiceID = i.InvoiceID;

Join elimination doesn’t occur in the query above because we are requesting that data from Sales.Invoices is returned, forcing SQL Server to read data from that table.

Finally, it’s important to note that join elimination will not occur when the foreign key has multiple columns, or if the tables are in tempdb. The latter is one of several reasons you shouldn’t try to solve optimization issues by copying your tables into tempdb.

Additional Scenarios

Multiple Tables

Join elimination isn’t only limited to two-table inner joins and tables with foreign key constraints.

For example, we can create an additional table that references our Sales.Invoices.InvoiceID column:

CREATE TABLE Sales.InvoiceClickTracking
  (
  	InvoiceClickTrackingID bigint IDENTITY PRIMARY KEY,
  	InvoiceID int
  	-- other fields would go here 
  );  
GO

ALTER TABLE [Sales].[InvoiceClickTracking]  WITH CHECK 
    ADD  CONSTRAINT [FK_Sales_InvoiceClickTracking_InvoiceID_Sales_Invoices] 
    FOREIGN KEY([InvoiceID])
  REFERENCES [Sales].[Invoices] ([InvoiceID]);

Joining this table into our original sample query will also allow SQL Server to eliminate our Sales.Invoices table:

SELECT 
  	il.InvoiceID,
  	ict.InvoiceID
  FROM
  	Sales.InvoiceLines il
  	INNER JOIN Sales.Invoices i
  		ON il.InvoiceID = i.InvoiceID
  	INNER JOIN Sales.InvoiceClickTracking ict
  		ON i.InvoiceID = ict.InvoiceID;

SQL Server can eliminate the Sales.Invoices table because of the transitive association between these tables’ relationships.

Unique Constraints

Instead of a foreign key constraint, SQL Server will also perform join elimination if it can trust the data relationship with a unique constraint:

ALTER TABLE [Sales].[InvoiceClickTracking] 
  DROP CONSTRAINT [FK_Sales_InvoiceClickTracking_InvoiceID_Sales_Invoices];
  GO
  
ALTER TABLE Sales.InvoiceClickTracking
  ADD CONSTRAINT UQ_InvoiceID UNIQUE (InvoiceID);   
GO 

  SELECT 
  	i.InvoiceID
  FROM
  	Sales.InvoiceClickTracking ict
  	RIGHT JOIN Sales.Invoices i
  		ON ict.InvoiceID = i.InvoiceID;

Outer Joins

As long as SQL Server can infer relationship constraints, other types of joins can experience table elimination as well. For example:

SELECT
	il.InvoiceID
FROM
	Sales.InvoiceLines il
	LEFT JOIN Sales.Invoices i
		ON il.InvoiceID = i.InvoiceID

Since we still have our foreign key constraint enforcing that every InvoiceID in Sales.InvoiceLines must have a corresponding InvoiceID in Sales.Invoices, SQL Server has no problem returning everything from Sales.InvoiceLInes without the need to join to Sales.Invoices:

No Constraint Required

If SQL Server can guarantee that it won’t need data from a certain table, it can potentially eliminate a join.

No join elimination occurs in this query because SQL Server can’t identify if the relationship between Sales.Invoices and Sales.InvoiceLines is 1-to-1, 1-to-0, or 1-to-many. It is forced to read Sales.InvoiceLines to determine if any matching rows are found:

SELECT
	i.InvoiceID
FROM
	Sales.InvoiceLines il
	RIGHT JOIN Sales.Invoices i
		ON il.InvoiceID = i.InvoiceID;

However, if we specify that we want a DISTINCT set of i.InvoiceIDs, every unique value from Sales.Invoices returns from SQL Server regardless of what relationship those rows have with Sales.InvoiceLines.

-- Just to prove no foreign key is at play here

ALTER TABLE [Sales].[InvoiceLines] 
DROP CONSTRAINT [FK_Sales_InvoiceLines_InvoiceID_Sales_Invoices];
GO

-- Our distinct result set
SELECT DISTINCT
	i.InvoiceID
FROM
	Sales.InvoiceLines il
	RIGHT JOIN Sales.Invoices i
		ON il.InvoiceID = i.InvoiceID;

Views

One advantage of join elimination is that it can work with views, even if the underlying view query is not able to use join elimination:

-- Add back our FK

ALTER TABLE [Sales].[InvoiceLines]    
WITH CHECK ADD  CONSTRAINT [FK_Sales_InvoiceLines_InvoiceID_Sales_Invoices] 
FOREIGN KEY([InvoiceID])
REFERENCES [Sales].[Invoices] ([InvoiceID]);
GO

-- Create our view using a query that cannot use join elimination
CREATE VIEW Sales.vInvoicesAndInvoiceLines
AS
	SELECT
		i.InvoiceID,
		i.InvoiceDate,
		il.Quantity,
		il.TaxRate
	FROM
		Sales.InvoiceLines il
		INNER JOIN Sales.Invoices i
			ON il.InvoiceID = i.InvoiceID;
GO

-- Join elimination works because we do not select any 
-- columns from the underlying Sales.Invoices table

SELECT Quantity, TaxRate FROM Sales.vInvoicesAndInvoiceLines;

Conclusion

Join elimination is an optimization that SQL Server performs when it determines it can provide an accurate result set without needing to read data from all tables specified in the submitted query. This optimization can provide significant performance improvements by reducing the number of pages SQL Server has to read, however it often comes at the expense of needing to maintain certain database constraints. We can refactor queries to achieve the simpler execution plans that join elimination provides, however having the query optimizer automatically simplify our plans by removing unnecessary joins is a nice benefit.

Again, I invite you to watch the full video version of this post.

About the Author

Guest Author : Bert WagnerBert is a business intelligence developer from Cleveland, Ohio. He loves writing fast-running queries and enjoys helping others learn to be self-sufficient SQL problem solvers. Bert blogs about SQL Server at bertwagner.com and creates SQL Server YouTube videos at youtube.com/c/bertwagner.