Ignoring NULLs with FIRST_VALUE

Published Tue 28 August 2018 in SQL > Development > Window Functions

The SQL Server FIRST_VALUE function makes it easy to return the "first value in an ordered set of values."

The problem is that if that first value happens to be a NULL, there is no easy, built-in way to skip it.

While a UserVoice item exists to add the ability to ignore nulls (go vote!), today, we're going accomplish that end result with some alternative queries.

The Setup

Here's the example data we'll be skipping nulls on:

CREATE TABLE ##Data
(
       Id int IDENTITY(0,1),
       GroupId int,
       Value1 int
);
GO
INSERT INTO ##Data VALUES (1,1)
INSERT INTO ##Data VALUES (1,1)
INSERT INTO ##Data VALUES (1,3)
INSERT INTO ##Data VALUES (2,NULL)
INSERT INTO ##Data VALUES (2,NULL)
INSERT INTO ##Data VALUES (2,6)
INSERT INTO ##Data VALUES (2,4)
INSERT INTO ##Data VALUES (2,5);
GO

colall

We've got a an integer identity column, two groups of rows, and NULLs that are sprinkled into otherwise unsuspecting integer values.

If we write a query that uses the FIRST_VALUE function, you'll notice that our NULL gets chosen in group two - not quite what we want:

SELECT
       Id,
       GroupId,
       Value1,
       FIRST_VALUE(Value1) OVER (PARTITION BY GroupId ORDER BY Id) AS FirstValue1
FROM
       ##Data

2018-08-26_08-38-39

Let's look at two queries that will help us get the number 6 into that FirstValue1 column for the second group.

The Contenders

"The Derived FIRST_VALUE"

First up is still the FIRST_VALUE function, but inside of a derived table:

SELECT
    d.Id,
    d.GroupId,
    d.Value1,
    d2.FirstNotNullValue1
FROM
    ##Data d
    INNER JOIN
    (
    SELECT DISTINCT
        GroupId,
        FIRST_VALUE(Value1) OVER (PARTITION BY GroupId ORDER BY Id) as FirstNotNullValue1
    FROM ##Data
    WHERE Value1 IS NOT NULL
    ) d2
        ON d.GroupId = d2.GroupId

By filtering out NULLs in our derived table query, FIRST_VALUE returns the first non-null value like we want. We then join that back to the original data and all is right again.

2018-08-26_08-42-00

"The Triple Join"

Our second attempt at this query sends us back to the dark ages of SQL Server 2008 before the FIRST_VALUE function existed:

SELECT
    d.Id,
    d.GroupId,
    d.Value1,
    d2.Value1 AS FirstNotNullValue1
FROM
    ##Data d
    LEFT JOIN
    (
    SELECT
        GroupId,
        MIN(Id) AS FirstNotNullIdValue1
    FROM
        ##Data
    WHERE
        Value1 IS NOT NULL
    GROUP BY
        GroupId
    ) m
        ON d.GroupId = m.GroupId
    INNER JOIN ##Data d2
        ON m.FirstNotNullIdValue1 = d2.Id;

We perform a triple join, with the critical element being our derived table which gets the MIN Id for each group of rows where Value1 IS NOT NULL. Once we have the minimum Id for each group, we join back in the original data and produce the same final result:

2018-08-26_08-46-00

The Performance

Both of the above queries produce the same output - which one should you use in your production code?

Well, the "Derived FIRST_VALUE" query has a lower relative cost than the "Triple Join" query, maybe it's better?

2018-08-26_08-48-22

This isn't a real-world execution plan though - surely we never scan heaps our production environments.

Let's add a quick clustered index and see if that changes anything:

CREATE CLUSTERED INDEX CL_Id ON ##Data (GroupId,Id,Value1)

2018-08-26_09-07-57

Okay, a closer match up but the "Derived FIRST_VALUE" query still appears to have a slight edge.

If we SET STATISTICS IO ON though we start to see a different story:

2018-08-26_09-14-05

With only 8 rows of data, our "Derived FIRST_VALUE" query sure is performing a lot of reads.

What if we increase the size of our sample dataset?

SET STATISTICS IO, TIME OFF;
SET NOCOUNT ON;
GO
INSERT INTO ##Data (GroupId, Value1)  
SELECT GroupId, Value1 FROM ##Data
GO 10

And now check our plans and stats IO:

2018-08-26_09-17-33

2018-08-26_09-17-50

WOW that's a lot of reads in the "Derived FIRST_VALUE" query.

Conclusion

Besides sharing some solutions, the point I tried to make above is that DON'T TRUST CODE YOU FIND ON THE INTERNET (or in books, or copied from colleagues, etc...)

Both of the above queries will return the first value without NULLs. But they probably won't perform exactly the same as they did on my examples above.

Copy the above code for sure - but test it out. See what works better on your specific server configuration, data size, and indexes. Maybe both queries are terrible and you need a third, better way of doing it (if you write one, let me know!) - but please, please, please, always test your code.

Should You Use Index Hints?

Published Tue 31 July 2018 in SQL > Performance Tuning > Indexing

Watch this week's video on YouTube

One of the things that the SQL Server query optimizer does is determine how to retrieve the data requested by your query.

Usually it does a pretty good job, which is a great because if it didn't then we'd be spending most of our days programming sorting and joining algorithms instead of having fun actually working with our data.

Sometimes the query optimizer has a lapse in judgement and createds a less-than-efficient plan, requiring us to step in and save the day.

Index Hints Give You Control

One way to "fix" a poor performing plan is to use an index hint. While we normally have no control over how SQL Server retrieves the data we requested, an index hint forces the query optimizer to use the index specified in the hint to retrieve the data (hence, it's really more of a "command" than a "hint").

Sometimes when I feel like I'm losing control I like using an index hint to show SQL Server who's boss. I occasionally will also use index hints when debugging poor performing queries because it allows me to confirm whether using an alternate index would improve performance without having to overhaul my code or change any other settings.

...But Sometimes That's Too Much Power

While I like using index hints for short-term debugging scenarios, that's about the only time they should be used because they can create some pretty undesirable outcomes.

For example, let's say I have this nice simple query and index here:

CREATE INDEX IX_OwnerUserId_CreationDate_Includes
ON dbo.Posts (OwnerUserId, CreationDate) INCLUDE (AcceptedAnswerId, ClosedDate, CommentCount, FavoriteCount, LastActivityDate);

SELECT
    OwnerUserId,
    AcceptedAnswerId
FROM
    dbo.Posts
WHERE
    OwnerUserId < 1000

This index was specifically created for a different query running on the Posts table, but it will also get used by the simple query above.

Executing this query without any hints causes SQL Server to use it anyway (since it's a pretty good index for the query), and we get decent performance: only 1002 logical reads.

2018-07-30_12-40-12 I wish all of my execution plans were this simple.

Let's pretend we don't trust the SQL Server optimizer to always choose this index, so instead we force it to use it by adding a hint:

SELECT
    OwnerUserId,
    AcceptedAnswerId
FROM
    dbo.Posts WITH (INDEX(IX_OwnerUserId_CreationDate_Includes))
WHERE
    OwnerUserId < 1000

With this hint, the index will perform exactly the same: 1002 logical reads, a good index seek, etc...

But what happens if in the future a better index gets added to the table?

CREATE INDEX IX_OwnerUserId_AcceptedAnswerId_Includes
ON dbo.Posts (OwnerUserId, AcceptedAnswerId) INCLUDE (LastEditorUserId, ParentId);

If we run the query WITHOUT the index hint, we'll see that SQL Server actually chooses this new index because it's smaller and we can get the data we need in only 522 logical reads:

2018-07-30_12-45-02 This execution plan looks the same, but you'll notice the smaller, more data dense index is being used.

If we had let SQL Server do it's job, it would have given us a great performing query! Instead, we decided to intervene and hint (ie. force) it to use a sub-optimal index.

Things Can Get Worse

The above example is pretty benign - sure, without the hint SQL Server would have read about half as many pages, but this isn't a drastic difference in this scenario.

What could be disastrous is if because of the hint, the query optimizer decides to make a totally different plan that isn't nearly as efficient. Or if one day someone drops the hinted index, causing the query with the hint to down right fail:

2018-07-30_12-50-55

Index hints can be nice to use in the short-term for investigating, testing, and debugging. However, they are almost never the correct long-term solution for fixing query performance.

Instead, it's better to look for the root-cause of a poor performing query: maybe you need to rebuild stats on an index or determine if the cardinality estimator being used is not ideal. You might also benefit from rewriting a terribly written query.

Any of these options will likely help you create a better, long-term, flexible solutions rather than forcing SQL Server to use the same hard-coded, potentially sub-optimal index forever.

Pinal Dave Helps Me Fix My Performance Tuning Problems

Published Tue 24 July 2018 in SQL > Performance Tuning

Watch this week's video on YouTube

This week I was fortunate enough to film a video in collaboration with Pinal Dave, the SQL Authority himself. Pinal is creative, hilarious, and kind; making this video with him was A BLAST!

Although the video is a little tongue in cheek, Pinal's recommendations are very real: I've encountered plenty of scenarios where these solutions fixed slow queries. Will these recommendations fix the problem in every situation? Of course not, but they are a great place to start.

Instead of creating a text version of the concepts covered in the video (you should really watch it), I thought it would be fun to do a behind-the-scenes narrative of how the video came together because it is unlike any other project I've done before.

The Idea

After agreeing to make a video together, we tossed around a few ideas. Because we live in different time zones, we thought it would be a fun to do something where I kept waking Pinal up in the middle of the night.

We iterated over what SQL Server examples to use (originally the second example was going to show my queries running out of space because autogrowth being turned off). We also ended up adding another example after my wife suggested that having it build to three scenarios instead of two would be funnier - I agree!

Asynchronous Filming

You've probably already figured it out, but I didn't really wake Pinal up in the video (honestly, I think midnight would be too early to wake him up anyway; in our back and forth emails, I was seeing responses from him that were in the 1-2am range).

I filmed a preliminary version of my parts of the video, very roughly edited them together, and sent it over to Pinal.

He then filmed his segments, giving me lots of great footage (I'm not sure if it was ad-libbed or not, but I was dying of laughter when watching through his clips).

Then I re-filmed my parts to try to match his dialog as closely as possible. Re-filming my parts also allowed me to self-edit and not ramble as much.

Everything Else

After that, it was just the usual process of editing, color correction, audio processing, etc...

I'm happy with how it turned out, especially given all of the technical challenges we had with filming separately.

Major thanks again to Pinal for being supportive and willing to make a fun SQL Server video. Enjoy!

Prioritizing Index Performance Pains - What I Learned From Brent Ozar's Master Index Tuning Class

Published Tue 12 June 2018 in Professional Development

Watch this week's video on YouTube

I love continuing my education in SQL Server.

Recently I was fortunate enough to take Brent Ozar's 3-day live online Master Index Tuning training class.

Since I had never taken an online class before, I thought it'd be fun to write a mini-review of what I thought about the class as well as what lesson from the course had the biggest impact on me.

As excited as I was to take the class, I was a little concerned about a couple of aspects about it. My main concerns before taking the online class were:

Would I have difficulty focusing in an online class? YouTube, Facebook, and other distractions are only a few clicks away.
Will it feel like a 1-way dialogue since I wouldn't be there in person to ask questions, participate in discussion, and interact with other student?

My first concern was immediately invalidated when we started working on our labs. During the 3 days, Brent does the traditional training of explaining problems, techniques, and solutions (in his own humorous way of course) BUT THEN instead of moving on to the next topic he gives the students time to work through problems on their own in labs he set up.

This was huge. Every student gets an 8 core, 64TB Azure VM running simulated workloads that provide a great real-world representation of an under performing SQL Server.

Instead of taking notes for 3 days and hoping that I would remember to try them out when I returned to work, the labs allowed me to apply those new concepts immediately. I think working through these problems in a real-world scenario is not only practical, but probably the best way of retaining the information you learn. I'm not sure how much work it was for Brent to set up these labs (guess: a ton), but they really payoff as a student.

As for my concern about the course not being interactive, that turned out to not be a problem either. During the duration of the class (as well as for a couple hours before and after the class each day) there was a Slack chatroom where Brent and the students could interact.

The Slack chat had two huge advantages:

Questions were typed out and written down, making it easy to know who asked what and what question Brent was responding to. It also made it easy to copy and paste the chat for my notes. It was also nice to have the ability to type a question overnight and get an answer the next morning.
We were asked to post all of our lab queries into the chat when we think we had found a solution to the performance problems. This was AWESOME because not only did I get 1-on-1 code review and advice from Brent, I got to see what everyone else was submitting as an answer.
Each lab scenario could usually be solved a few different ways, so this meant other students were submitting completely different solutions than what I came up with. Seeing other people's approach to problem solving was really cool. Even better, I could try out their solutions in my own lab by copying and pasting the queries from Slack into my lab VM, seeing how those different solutions compared to my own.

So while I had some concerns before the class started, it turns out they weren't problems at all and the class was great.

My Favorite Lesson: Prioritizing Performance Solutions

The Master Index Tuning training covered everything needed to practically troubleshoot and solve index performance issues. While Brent taught many techniques, from everyday solutions to advanced once-in-a-while techniques for certain edge cases, my biggest takeaway was how Brent taught us to prioritize the low-hanging-index-tuning-fruit first.

I'm a query tuner by nature. Before taking this training, my go-to solution for fixing a performance problem was to start refactoring a query or correcting the indexes for that one particular query.

Instead, Brent recommended that we focus on the actions with the largest performance returns first: at a database or instance level, clean up duplicate (or near duplicate) indexes, drop unused indexes, add missing indexes, etc...

Many individual query performance issues can be fixed by changing indexes alone, without ever needing to touch the queries themselves. Fixing poorly created indexes on your server has the additional advantage that it can improve performance of MULTIPLE queries running on your server simultaneously.

The takeaway here is that unless your server is already in tip-top shape, you'll probably get more bang for your buck fixing duplicate, inadequate, and missing index issues than you will by tuning individual queries. This may sound obvious, but from my "every problem is a query tuning problem" standpoint this was a nice reminder that sometimes looking at the bigger picture will produce larger returns than focusing only on the details.

Final Verdict: Is It Worth It?

Yes. Especially if you are the kind of person who learns best by doing and wants real-world index tuning advice. By the end of the class I felt confident enough to immediately go back to work and start applying what I had learned from Brent and practiced in the labs.

In-Memory OLTP: A Case Study

Published Tue 17 April 2018 in SQL > Development

Watch this week's video on YouTube

When In-Memory OLTP was first released in SQL Server 2014, I was excited to start using it. All I could think was "my queries are going to run so FAST!"

Well, I never got around to implementing In-Memory OLTP. Besides having an incompatible version of SQL Server at the time, the in-memory features had too many limitations for my specific use-cases.

Fast forward a few years, and I've done nothing with In-Memory OLTP. Nothing that is until I saw Erin Stellato present at our Northern Ohio SQL Server User Group a few weeks ago - her presentation inspired me to take a look at In-Memory OLTP again to see if I could use it.

Use case: Improving ETL staging loads

After being refreshed on the ins and outs of in-memory SQL Server, I wanted to see if I could apply some of the techniques to one of my etls.

The ETL consists of two major steps:

Shred documents into row/column data and then dump that data into a staging table.
Delete some of the documents from the staging table.

In the real world, there's a step 1.5 that does some processing of the data, but it's not relevant to these in-memory OLTP demos.

So step one was to create my staging tables. The memory optimized table is called "NewStage1" and the traditional disked based tabled is called "OldStage1":

DROP DATABASE IF EXISTS InMemoryTest;
GO
CREATE DATABASE InMemoryTest;
GO
USE InMemoryTest;
GO

ALTER DATABASE InMemoryTest ADD FILEGROUP imoltp_mod CONTAINS MEMORY_OPTIMIZED_DATA;
GO 
ALTER DATABASE InMemoryTest ADD FILE (name='imoltp_mod1', filename='C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\DATA\imoltp_mod1') TO FILEGROUP imoltp_mod;
GO  
ALTER DATABASE InMemoryTest SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT=ON;
GO 
ALTER DATABASE InMemoryTest SET RECOVERY SIMPLE
GO

--Numbers Table
-- This needs to be in-memory to be called from a natively compiled procedure
DROP TABLE IF EXISTS InMemoryTest.dbo.Numbers;
GO
CREATE TABLE InMemoryTest.dbo.Numbers
(
    n int
    INDEX ix_n NONCLUSTERED HASH (n) WITH (BUCKET_COUNT=400000)
)
WITH (MEMORY_OPTIMIZED=ON, DURABILITY=SCHEMA_ONLY);  
GO 

INSERT INTO dbo.Numbers (n)
SELECT TOP (4000000) n = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);

-- Set up on-disk tables
DROP TABLE IF EXISTS InMemoryTest.dbo.OldStage1;
GO
CREATE TABLE InMemoryTest.dbo.OldStage1
(
    Id int,
    Col1 uniqueidentifier,
    Col2 uniqueidentifier,
    Col3 varchar(1000),
    Col4 varchar(50),
    Col5 varchar(50),
    Col6 varchar(50),
    Col7 int,
    Col8 int,
    Col9 varchar(50),
    Col10 varchar(900),
    Col11 varchar(900),
    Col12 int,
    Col13 int,
    Col14 bit
);
GO
CREATE CLUSTERED INDEX CL_Id ON InMemoryTest.dbo.OldStage1 (Id);
GO


--  Set up in-memory tables and natively compiled procedures
DROP TABLE IF EXISTS InMemoryTest.dbo.NewStage1;
GO
CREATE TABLE InMemoryTest.dbo.NewStage1
(
    Id int,
    Col1 uniqueidentifier,
    Col2 uniqueidentifier,
    Col3 varchar(1000),
    Col4 varchar(50),
    Col5 varchar(50),
    Col6 varchar(50),
    Col7 int,
    Col8 int,
    Col9 varchar(50),
    Col10 varchar(900),
    Col11 varchar(900),
    Col12 int,
    Col13 int,
    Col14 bit
    INDEX ix_id NONCLUSTERED HASH (id) WITH (BUCKET_COUNT=10)
)
WITH (MEMORY_OPTIMIZED=ON, DURABILITY=SCHEMA_ONLY);  
GO

Few things to keep in mind:

The tables have the same columns and datatypes, with the only difference being that the NewStage1 table is memory optimized.
My database is using simple recovery so I am able to perform minimal logging/bulk operations on my disk-based table.
Additionally, I'm using the SCHEMA_ONLY durability setting. This gives me outstanding performance because there is no writing to the transaction log! However, this means if I lose my in-memory data for any reason (crash, restart, corruption, etc...) I am completely out of luck. This is fine for my staging data scenario since I can easily recreate the data if necessary.

Inserting and deleting data

Next I'm going to create procedures for inserting and deleting my data into both my new and old staging tables:

DROP PROCEDURE IF EXISTS dbo.Insert_OldStage1;
GO
CREATE PROCEDURE dbo.Insert_OldStage1
    @Id int,
    @Rows int
AS
BEGIN
    INSERT INTO InMemoryTest.dbo.OldStage1 (Id, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Col13, Col14)
    SELECT Id, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Col13, Col14
    FROM
    (
    SELECT
    @Id as Id,
    '92D14DA3-2C55-4E50-A965-7D3C941417B3' as Col1,
    '92D14DA3-2C55-4E50-A965-7D3C941417B3' as Col2,
    'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' as Col3,
    'aaaaaaaaaaaaaaaaaaaa' as Col4,
    'aaaaaaaaaaaaaaaaaaaa' as Col5,
    'aaaaaaaaaaaaaaaaaaaa' as Col6,
    0 as Col7,
    0 as Col8,
    'aaaaaaaaaaaaaaaaaaaa' as Col9,
    'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' as Col10,
    'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' as Col11,
    1 as Col12,
    1 as Col13,
    1 as Col14
    )a
    CROSS APPLY
    (
    SELECT TOP (@Rows) n FROM dbo.Numbers
    )b
END

DROP PROCEDURE IF EXISTS dbo.Delete_OldStage1;
GO
CREATE PROCEDURE dbo.Delete_OldStage1
    @Id int
AS
BEGIN
    -- Use loop to delete to prevent filling transaction log
    DECLARE 
        @Count int = 0,
        @for_delete int,
        @chunk_size int = 1000000

    SELECT @for_delete = COUNT(Id) FROM InMemoryTest.dbo.OldStage1 
                            WHERE Id = @Id;

    WHILE (@Count < @for_delete)
    BEGIN
        SELECT @Count = @Count + @chunk_size;

        BEGIN TRAN
            DELETE TOP(@chunk_size) FROM InMemoryTest.dbo.OldStage1 WHERE Id = @Id
        COMMIT TRAN
    END
END;
GO



DROP PROCEDURE IF EXISTS dbo.Insert_NewStage1;
GO
CREATE PROCEDURE dbo.Insert_NewStage1
    @Id int,
    @Rows int
WITH NATIVE_COMPILATION, SCHEMABINDING  
AS   
BEGIN ATOMIC   
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')  

    INSERT INTO dbo.NewStage1 (Id, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Col13, Col14)
    SELECT Id, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Col13, Col14 
    FROM
    (
    SELECT
    @Id as Id,
    '92D14DA3-2C55-4E50-A965-7D3C941417B3' as Col1,
    '92D14DA3-2C55-4E50-A965-7D3C941417B3' as Col2,
    'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' as Col3,
    'aaaaaaaaaaaaaaaaaaaa' as Col4,
    'aaaaaaaaaaaaaaaaaaaa' as Col5,
    'aaaaaaaaaaaaaaaaaaaa' as Col6,
    0 as Col7,
    0 as Col8,
    'aaaaaaaaaaaaaaaaaaaa' as Col9,
    'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' as Col10,
    'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' as Col11,
    1 as Col12,
    1 as Col13,
    1 as Col14
    )a
    CROSS APPLY
    (
    SELECT TOP (@Rows) n FROM dbo.Numbers
    )b


END;   
GO  

DROP PROCEDURE IF EXISTS dbo.Delete_NewStage1;
GO
CREATE PROCEDURE dbo.Delete_NewStage1
    @Id int 
WITH NATIVE_COMPILATION, SCHEMABINDING  
AS   
BEGIN ATOMIC   
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')  

    DELETE FROM dbo.NewStage1 WHERE Id = @Id;

END   
GO

Few more things to note:

My new procedures are natively compiled: SQL Server compiles them up front so at run time it can just execute without any extra steps. The procedures that target my old disk-based tables will have to compile every time.
In the old delete procedure, I am deleting data in chunks so my transaction log doesn't get full. In the new version of the procedure, I don't have to worry about this because, as I mentioned earlier, my memory optimized table doesn't have to use the transaction log.

Let's simulate a load

It's time to see if all of this fancy in-memory stuff is actually worth all of the restrictions.

In my load, I'm going to mimic loading three documents with around 3 million rows each. Then, I'm going to delete the second document from each table:

-- Old on-disk method
-- Insert data for processing
EXEC InMemoryTest.dbo.Insert_OldStage1 @Id=1, @Rows=2500000;
GO
EXEC InMemoryTest.dbo.Insert_OldStage1 @Id=2, @Rows=3400000;
GO 
EXEC InMemoryTest.dbo.Insert_OldStage1 @Id=3, @Rows=2800000;
GO 

-- Delete set of records after processed
EXEC InMemoryTest.dbo.Delete_OldStage1 @Id = 2
GO

-- New in-memory method
-- Insert data for processing
EXEC InMemoryTest.dbo.Insert_NewStage1 @Id=1, @Rows=2500000;
GO
EXEC InMemoryTest.dbo.Insert_NewStage1 @Id=2, @Rows=3400000;
GO 
EXEC InMemoryTest.dbo.Insert_NewStage1 @Id=3, @Rows=2800000;
GO 

-- Delete set of records after processed
EXEC InMemoryTest.dbo.Delete_NewStage1 @Id = 2
GO

The in-memory version should have a significant advantage because:

The natively compiled procedure is precompiled (shouldn't be a huge deal here since we are doing everything in a single INSERT INTO...SELECT).
The in-memory table inserts/deletes don't have to write to the transaction log (this should be huge!)

Results

  -------------------- ---------------------------------------- -----------------------------------------
                                    **Disk-based**                            **In-Memory**
  INSERT 3 documents                    65 sec                                    6 sec
  DELETE 1 document                     46 sec                                    0 sec
  Total time                           111 sec                                    6 sec
  Difference                        -95% slower                                1750% faster
  -------------------- ---------------------------------------- -----------------------------------------

The results speak for themselves. In this particular example, in-memory destroys the disk-based solution out of the water.

Obviously there are downsides to in-memory (like consuming a lot of memory) but if you are going for pure speed, there's nothing faster.

Warning! I am not you.

And you are not me.

While in-memory works great for my ETL scenario, there are many requirements and limitations. It's not going to work in every scenario. Be sure you understand the in-memory durability options to prevent any potential data loss and try it out for yourself! You might be surprised by the performance gains you'll see.