Joining on NULLs

Published on: 2019-03-26

Joining on nulls thumbnail
Watch this week’s episode on YouTube

It’s important to be aware of columns that allow NULL values since SQL Server may handle NULLs differently than you might expect.

Today I want to look at what things to consider when joining on columns containing NULL values.

Natural, Composite, NULLable keys

Let’s pretend we have an Account table containing the accounts of various users and an AccountType table describing the different types of accounts:

Account and AccountType tables

These tables have the unfortunate design characteristics of:

  1. They use a natural, composite key of YearOpened and AccountType
  2. NULL is the valid default for AccountType

Not that either of the above attributes are outright bad, just that we need to handle them appropriately. For example, if we want to bring back a description of each user’s account, we might write a query with an inner join like this:

SELECT
    a.UserId,
    at.YearOpened,
    at.AccountType,
    at.Description
FROM
    dbo.Account a
    INNER JOIN dbo.AccountType at
        ON a.YearOpened = at.YearOpened
        AND a.AccountType = at.AccountType

Only to discover the rows with NULLs are not present:

Inner join with no NULLs

Joining on NULLs

Since it’s not possible to join on NULL values in SQL Server like you might expect, we need to be creative to achieve the results we want.

One option is to make our AccountType column NOT NULL and set some other default value. Another option is to create a new column that will act as a surrogate key to join on instead.

Both of the above options would fix the problem at the source, but what about if we can only make changes to our queries?

One common approach is to convert the NULLs to some other non-NULL value using a function like COALESCE or ISNULL:

SELECT
    a.UserId,
    at.YearOpened,
    at.AccountType,
    at.Description
FROM
    dbo.Account a
    INNER JOIN dbo.AccountType at
        ON a.YearOpened = at.YearOpened
        AND ISNULL(a.AccountType,'`') = ISNULL(at.AccountType,'`')
Correct results with ISNULL

While this returns the results we want, there are two major issues with this approach:

  1. In the above example we converted NULLs to the ` character. If we had a valid ` character in our data, we would get logically incorrect joins.
  2. Our query can no longer perform index seeks.

The first issue isn’t a huge deal if you can guarantee the character you are replacing NULLs with will never appear in the column of data.

The second issue is more important since ISNULL prevents your query from being SARGable and will cause poor performance on large tables of data.

SARG Killer

Those Compute Scalar operators are forcing SQL Server to Scan the indexes and compute a value for every row.

A More Efficient Solution

If using a function like ISNULL hurts the performance of our queries, what can we do instead?

SELECT
    a.UserId,
    at.YearOpened,
    at.AccountType,
    at.Description
FROM
    dbo.Account a
    INNER JOIN dbo.AccountType at
        ON a.YearOpened = at.YearOpened
        AND (a.AccountType = at.AccountType OR (a.AccountType IS NULL AND at.AccountType IS NULL))
Correct results with efficiency

This produces the same exact results while allowing SQL Server to Seek when possible and avoid costly row by row computations:

There are no seeks here since I don’t have any additional filters, but the lack of Compute Scalar operators should be enough to prove the point.

While there are a few more variations that can achieve the same results using different execution plans (writing a query that joins non-nulls and unioning it with a query that selects only the nulls, using a computed column to convert the NULLs to non-null values, etc…) the key to good performance is to choose a solution that will not force SQL Server to compute values for every single row.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

5 thoughts on “Joining on NULLs”

  1. A pattern I’m quite fond of that should result in the same plan without the nested logic in the join is:

    JOIN … ON EXISTS (SELECT
    a.YearOpened,
    a.AccountType
    INTERSECT
    SELECT
    at.YearOpened,
    at.AccountType
    )

    which takes advantage of the fact that UNION/EXCEPT/INTERSECT don’t honour the usual three-valued logic. One disadvantage (other than the fact it may baffle those reading your code at first) is that if the YearOpened columns on the base tables are changed to allow NULLs, the INTERSECT code will start joining on NULLs in those columns, which you might not want and wouldn’t happen with the “a.YearOpened = at.YearOpened” code.

    Depending on the impact of such a thing happening, the join could be changed to use equality operators for all NOT NULL columns and INTERSECT for all NULL columns, which may or not look messy depending on how you like your code. Being able to SCHEMABIND stored procedure code would be a nice way to protect your code from table changes like that, but I guess it’s just another argument in favour of regression testing and dependency tracking.

    1. Thanks for the post! I’d been talking to somebody about the very issue the day before you posted.

      Clarifying the Caveats sounds like either a euphemism or *the* best album title

  2. Not sure of most efficient method, but I have tended to use UNION ALL instead of the OR

    SELECT a.UserId, at.YearOpened, at.AccountType, at.Description
    FROM dbo.Account a
    INNER JOIN dbo.AccountType at
    ON a.YearOpened = at.YearOpened
    AND a.AccountType = at.AccountType
    UNION ALL
    SELECT a.UserId, at.YearOpened, at.AccountType, at.Description
    FROM dbo.Account a
    INNER JOIN dbo.AccountType at
    ON a.YearOpened = at.YearOpened
    AND a.AccountType IS NULL AND at.AccountType IS NULL

    as performance has been OK (and sometimes the performance of OR has been poor)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.