Ignoring NULLs with FIRST_VALUE

Published on: 2018-08-28

Watch this week’s episode on YouTube.

The SQL Server FIRST_VALUE function makes it easy to return the “first value in an ordered set of values.”

The problem is that if that first value happens to be a NULL, there is no easy, built-in way to skip it.

While a UserVoice item exists to add the ability to ignore nulls (go vote!), today, we’re going accomplish that end result with some alternative queries.

The Setup

Here’s the example data we’ll be skipping nulls on:

We’ve got a an integer identity column, two groups of rows, and NULLs that are sprinkled into otherwise unsuspecting integer values.

If we write a query that uses the FIRST_VALUE function, you’ll notice that our NULL gets chosen in group two – not quite what we want:

Let’s look at two queries that will help us get the number 6 into that FirstValue1 column for the second group.

The Contenders

“The Derived FIRST_VALUE”

First up is still the FIRST_VALUE function, but inside of a derived table:

By filtering out NULLs in our derived table query, FIRST_VALUE returns the first non-null value like we want.  We then join that back to the original data and all is right again.

“The Triple Join”

Our second attempt at this query sends us back to the dark ages of SQL Server 2008 before the FIRST_VALUE function existed:

We perform a triple join, with the critical element being our derived table which gets the MIN Id for each group of rows where Value1 IS NOT NULL.  Once we have the minimum Id for each group, we join back in the original data and produce the same final result:

The Performance

Both of the above queries produce the same output – which one should you use in your production code?

Well, the “Derived FIRST_VALUE” query has a lower relative cost than the “Triple Join” query, maybe it’s better?

This isn’t a real-world execution plan though – surely we never scan heaps our production environments.

Let’s add a quick clustered index and see if that changes anything:

Okay, a closer match up but the “Derived FIRST_VALUE” query still appears to have a slight edge.

If we SET STATISTICS IO ON though we start to see a different story:

With only 8 rows of data, our “Derived FIRST_VALUE” query sure is performing a lot of reads.

What if we increase the size of our sample dataset?

And now check our plans and stats IO:

WOW that’s a lot of reads in the “Derived FIRST_VALUE” query.

Conclusion

Besides sharing some solutions, the point I tried to make above is that DON’T TRUST CODE YOU FIND ON THE INTERNET (or in books, or copied from colleagues, etc…)

Both of the above queries will return the first value without NULLs.  But they probably won’t perform exactly the same as they did on my examples above.

Copy the above code for sure – but test it out. See what works better on your specific server configuration, data size, and indexes.  Maybe both queries are terrible and you need a third, better way of doing it (if you write one, let me know!) – but please, please, please, always test your code.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Here’s a Quick Way To Generate a Running Total in SQL Server

Published on: 2017-07-18

SQL in 60 Seconds #2

Historically it’s been difficult to accomplish certain tasks in SQL Server.

Probably the most annoying problem I had to do regularly before SQL Server 2012 was to generate a running total. How can a running total be so easy to do in Excel, but difficult to do in SQL?

SUM(), click, drag, done. Excel, you will always have a place in my heart.

Before SQL Server 2012, the solution to generating a running total involved cursors, CTEs, nested subqueries, or cross applies. This StackOverflow thread has a variety of solutions if you need to solve this problem in an older version of SQL Server.

However, SQL Server 2012’s introduction of window functions makes creating a running total incredibly easy.

First, some test data:

Next, we write our query using the following window function OVER() syntax:

The syntax for our OVER() clause is simple:

  • SUM(Price) specifies which column we are creating the running total on
  • PARTITION BY specifies around what group of data we want to create our “window” — each new window will reset the running total
  • ORDER BY specifies in what order the rows should be sorted before being summed

The results? An easy to write running total:

 

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!