The Forgotten Fourth SQL Server Recovery Model

Published Tue 24 April 2018 in SQL > Development

SQL Server recovery models define when database transactions are written to the transaction log. Understanding these models is critical for backup and recovery purposes as well as for how their behaviors impact the performance of queries.

Today we'll examine the differences between SQL Server's three official recovery models as well as an unofficial "fourth" recovery model that won't help in backup/recovery, but will help in performance of certain processes.

Full Recovery

The only recovery model that can potentially save all of your data when something bad happens (NOTE: "potentially" because if you aren't taking enough and/or testing your backups, you might experience data loss).

Under the full recovery model, every transaction is written to the transaction log first, and then persisted to the actual database. This means if something disastrous happens to your server, as long as the change made its way into the transaction log AND your transaction log is readable AND your previous full/differential/log backups can be restored, you shouldn't experience any data loss (there are a lot of assumptions made with that statement though, so don't use this post as your only data loss prevention guide)

From a performance standpoint, full recovery is the slowest of the bunch because every transaction needs to be logged, and that creates some overhead. Might be good for your OLTP databases, maybe not so much for your analytical staging databases (assuming you can recreate that data).

Simple Recovery

While some people incorrectly believe that simple recovery means no writing to the transaction log (need proof that a database in simple recovery still writes to the trans log? Try running ROLLBACK TRANSACTION after a huge delete) it actually means that the transaction log is cleared as soon as SQL Server is done using it and data has made its way to disk.

Since the transaction log is cleared regularly, your overall log size can be smaller since space is regularly reused. Additionally, since that space is cleared you don't have to worry about backing it up.

No persistence of the transaction log means you won't be able to recover all of your data in case of server failure though. This is generally OK if you are using simple recovery in databases where its easy to recreate any data since your last full backup (eg. staging data where you can easily redo the transactions that were lost).

Simple recovery minimally logs as many transactions as possible, making the throughput faster. This works well in staging databases and for ETLs where data is in flux and can be easily recreated.

Bulk-Logged Recovery

If Goldilocks thinks the full recovery model has too much logging, and the simple model not enough logging, then she'll find the amount of logging in the bulk-logged recovery model to be just right.

Under bulk-logged, most transactions are fully logged, allowing for data restoration of those fully logged transactions if the need arises. Bulk transactions however are minimally logged, allowing for better performance of things like bulk inserts (but no ability for restoration).

While restorations under the bulk-logged recovery model aren't as flexible as full recovery (eg. if the transaction log has any bulk transactions, you have to restore the whole transaction log instead of just up to a certain point), it does allow full logging for when most people need it and minimal logging for when most people don't need it. Meaning for certain situations you can have your cake and eat it too!

The Fourth Unofficial Recovery Model: In-Memory SCHEMA_ONLY Durability

The SCHEMA_ONLY durability setting on a memory optimized table isn't a recovery model. But it does behave a little bit like a recovery model in the sense that it defines how operations against your memory optimized table interact with your transaction log:

They don't. Well, almost.

And that's the beauty of it, at least from a performance stand point. If you are willing to trade off the ability to recover data for performance, then the SCHEMA_ONLY durability fits the bill - so long transaction log overhead.

So while none of the official recovery models allow you to prevent writing to the transaction log, the SCHEMA_ONLY durability setting does!

In-Memory OLTP: A Case Study

Published Tue 17 April 2018 in SQL > Development

Watch this week's video on YouTube

When In-Memory OLTP was first released in SQL Server 2014, I was excited to start using it. All I could think was "my queries are going to run so FAST!"

Well, I never got around to implementing In-Memory OLTP. Besides having an incompatible version of SQL Server at the time, the in-memory features had too many limitations for my specific use-cases.

Fast forward a few years, and I've done nothing with In-Memory OLTP. Nothing that is until I saw Erin Stellato present at our Northern Ohio SQL Server User Group a few weeks ago - her presentation inspired me to take a look at In-Memory OLTP again to see if I could use it.

Use case: Improving ETL staging loads

After being refreshed on the ins and outs of in-memory SQL Server, I wanted to see if I could apply some of the techniques to one of my etls.

The ETL consists of two major steps:

Shred documents into row/column data and then dump that data into a staging table.
Delete some of the documents from the staging table.

In the real world, there's a step 1.5 that does some processing of the data, but it's not relevant to these in-memory OLTP demos.

So step one was to create my staging tables. The memory optimized table is called "NewStage1" and the traditional disked based tabled is called "OldStage1":

DROP DATABASE IF EXISTS InMemoryTest;
GO
CREATE DATABASE InMemoryTest;
GO
USE InMemoryTest;
GO

ALTER DATABASE InMemoryTest ADD FILEGROUP imoltp_mod CONTAINS MEMORY_OPTIMIZED_DATA;
GO 
ALTER DATABASE InMemoryTest ADD FILE (name='imoltp_mod1', filename='C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\DATA\imoltp_mod1') TO FILEGROUP imoltp_mod;
GO  
ALTER DATABASE InMemoryTest SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT=ON;
GO 
ALTER DATABASE InMemoryTest SET RECOVERY SIMPLE
GO

--Numbers Table
-- This needs to be in-memory to be called from a natively compiled procedure
DROP TABLE IF EXISTS InMemoryTest.dbo.Numbers;
GO
CREATE TABLE InMemoryTest.dbo.Numbers
(
    n int
    INDEX ix_n NONCLUSTERED HASH (n) WITH (BUCKET_COUNT=400000)
)
WITH (MEMORY_OPTIMIZED=ON, DURABILITY=SCHEMA_ONLY);  
GO 

INSERT INTO dbo.Numbers (n)
SELECT TOP (4000000) n = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);

-- Set up on-disk tables
DROP TABLE IF EXISTS InMemoryTest.dbo.OldStage1;
GO
CREATE TABLE InMemoryTest.dbo.OldStage1
(
    Id int,
    Col1 uniqueidentifier,
    Col2 uniqueidentifier,
    Col3 varchar(1000),
    Col4 varchar(50),
    Col5 varchar(50),
    Col6 varchar(50),
    Col7 int,
    Col8 int,
    Col9 varchar(50),
    Col10 varchar(900),
    Col11 varchar(900),
    Col12 int,
    Col13 int,
    Col14 bit
);
GO
CREATE CLUSTERED INDEX CL_Id ON InMemoryTest.dbo.OldStage1 (Id);
GO


--  Set up in-memory tables and natively compiled procedures
DROP TABLE IF EXISTS InMemoryTest.dbo.NewStage1;
GO
CREATE TABLE InMemoryTest.dbo.NewStage1
(
    Id int,
    Col1 uniqueidentifier,
    Col2 uniqueidentifier,
    Col3 varchar(1000),
    Col4 varchar(50),
    Col5 varchar(50),
    Col6 varchar(50),
    Col7 int,
    Col8 int,
    Col9 varchar(50),
    Col10 varchar(900),
    Col11 varchar(900),
    Col12 int,
    Col13 int,
    Col14 bit
    INDEX ix_id NONCLUSTERED HASH (id) WITH (BUCKET_COUNT=10)
)
WITH (MEMORY_OPTIMIZED=ON, DURABILITY=SCHEMA_ONLY);  
GO

Few things to keep in mind:

The tables have the same columns and datatypes, with the only difference being that the NewStage1 table is memory optimized.
My database is using simple recovery so I am able to perform minimal logging/bulk operations on my disk-based table.
Additionally, I'm using the SCHEMA_ONLY durability setting. This gives me outstanding performance because there is no writing to the transaction log! However, this means if I lose my in-memory data for any reason (crash, restart, corruption, etc...) I am completely out of luck. This is fine for my staging data scenario since I can easily recreate the data if necessary.

Inserting and deleting data

Next I'm going to create procedures for inserting and deleting my data into both my new and old staging tables:

DROP PROCEDURE IF EXISTS dbo.Insert_OldStage1;
GO
CREATE PROCEDURE dbo.Insert_OldStage1
    @Id int,
    @Rows int
AS
BEGIN
    INSERT INTO InMemoryTest.dbo.OldStage1 (Id, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Col13, Col14)
    SELECT Id, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Col13, Col14
    FROM
    (
    SELECT
    @Id as Id,
    '92D14DA3-2C55-4E50-A965-7D3C941417B3' as Col1,
    '92D14DA3-2C55-4E50-A965-7D3C941417B3' as Col2,
    'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' as Col3,
    'aaaaaaaaaaaaaaaaaaaa' as Col4,
    'aaaaaaaaaaaaaaaaaaaa' as Col5,
    'aaaaaaaaaaaaaaaaaaaa' as Col6,
    0 as Col7,
    0 as Col8,
    'aaaaaaaaaaaaaaaaaaaa' as Col9,
    'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' as Col10,
    'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' as Col11,
    1 as Col12,
    1 as Col13,
    1 as Col14
    )a
    CROSS APPLY
    (
    SELECT TOP (@Rows) n FROM dbo.Numbers
    )b
END

DROP PROCEDURE IF EXISTS dbo.Delete_OldStage1;
GO
CREATE PROCEDURE dbo.Delete_OldStage1
    @Id int
AS
BEGIN
    -- Use loop to delete to prevent filling transaction log
    DECLARE 
        @Count int = 0,
        @for_delete int,
        @chunk_size int = 1000000

    SELECT @for_delete = COUNT(Id) FROM InMemoryTest.dbo.OldStage1 
                            WHERE Id = @Id;

    WHILE (@Count < @for_delete)
    BEGIN
        SELECT @Count = @Count + @chunk_size;

        BEGIN TRAN
            DELETE TOP(@chunk_size) FROM InMemoryTest.dbo.OldStage1 WHERE Id = @Id
        COMMIT TRAN
    END
END;
GO



DROP PROCEDURE IF EXISTS dbo.Insert_NewStage1;
GO
CREATE PROCEDURE dbo.Insert_NewStage1
    @Id int,
    @Rows int
WITH NATIVE_COMPILATION, SCHEMABINDING  
AS   
BEGIN ATOMIC   
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')  

    INSERT INTO dbo.NewStage1 (Id, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Col13, Col14)
    SELECT Id, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Col13, Col14 
    FROM
    (
    SELECT
    @Id as Id,
    '92D14DA3-2C55-4E50-A965-7D3C941417B3' as Col1,
    '92D14DA3-2C55-4E50-A965-7D3C941417B3' as Col2,
    'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' as Col3,
    'aaaaaaaaaaaaaaaaaaaa' as Col4,
    'aaaaaaaaaaaaaaaaaaaa' as Col5,
    'aaaaaaaaaaaaaaaaaaaa' as Col6,
    0 as Col7,
    0 as Col8,
    'aaaaaaaaaaaaaaaaaaaa' as Col9,
    'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' as Col10,
    'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' as Col11,
    1 as Col12,
    1 as Col13,
    1 as Col14
    )a
    CROSS APPLY
    (
    SELECT TOP (@Rows) n FROM dbo.Numbers
    )b


END;   
GO  

DROP PROCEDURE IF EXISTS dbo.Delete_NewStage1;
GO
CREATE PROCEDURE dbo.Delete_NewStage1
    @Id int 
WITH NATIVE_COMPILATION, SCHEMABINDING  
AS   
BEGIN ATOMIC   
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')  

    DELETE FROM dbo.NewStage1 WHERE Id = @Id;

END   
GO

Few more things to note:

My new procedures are natively compiled: SQL Server compiles them up front so at run time it can just execute without any extra steps. The procedures that target my old disk-based tables will have to compile every time.
In the old delete procedure, I am deleting data in chunks so my transaction log doesn't get full. In the new version of the procedure, I don't have to worry about this because, as I mentioned earlier, my memory optimized table doesn't have to use the transaction log.

Let's simulate a load

It's time to see if all of this fancy in-memory stuff is actually worth all of the restrictions.

In my load, I'm going to mimic loading three documents with around 3 million rows each. Then, I'm going to delete the second document from each table:

-- Old on-disk method
-- Insert data for processing
EXEC InMemoryTest.dbo.Insert_OldStage1 @Id=1, @Rows=2500000;
GO
EXEC InMemoryTest.dbo.Insert_OldStage1 @Id=2, @Rows=3400000;
GO 
EXEC InMemoryTest.dbo.Insert_OldStage1 @Id=3, @Rows=2800000;
GO 

-- Delete set of records after processed
EXEC InMemoryTest.dbo.Delete_OldStage1 @Id = 2
GO

-- New in-memory method
-- Insert data for processing
EXEC InMemoryTest.dbo.Insert_NewStage1 @Id=1, @Rows=2500000;
GO
EXEC InMemoryTest.dbo.Insert_NewStage1 @Id=2, @Rows=3400000;
GO 
EXEC InMemoryTest.dbo.Insert_NewStage1 @Id=3, @Rows=2800000;
GO 

-- Delete set of records after processed
EXEC InMemoryTest.dbo.Delete_NewStage1 @Id = 2
GO

The in-memory version should have a significant advantage because:

The natively compiled procedure is precompiled (shouldn't be a huge deal here since we are doing everything in a single INSERT INTO...SELECT).
The in-memory table inserts/deletes don't have to write to the transaction log (this should be huge!)

Results

  -------------------- ---------------------------------------- -----------------------------------------
                                    **Disk-based**                            **In-Memory**
  INSERT 3 documents                    65 sec                                    6 sec
  DELETE 1 document                     46 sec                                    0 sec
  Total time                           111 sec                                    6 sec
  Difference                        -95% slower                                1750% faster
  -------------------- ---------------------------------------- -----------------------------------------

The results speak for themselves. In this particular example, in-memory destroys the disk-based solution out of the water.

Obviously there are downsides to in-memory (like consuming a lot of memory) but if you are going for pure speed, there's nothing faster.

Warning! I am not you.

And you are not me.

While in-memory works great for my ETL scenario, there are many requirements and limitations. It's not going to work in every scenario. Be sure you understand the in-memory durability options to prevent any potential data loss and try it out for yourself! You might be surprised by the performance gains you'll see.

Full Automatic Tuning: SQL Server 2026's Most Killer Feature

Published Tue 13 March 2018 in SQL > TSqlTuesday

This post is a response to this month's T-SQL Tuesday #100 prompt by the creator of T-SQL Tuesday himself, Adam Machanic. T-SQL Tuesday is a way for SQL Server bloggers to share ideas about a different database or professional topic every month.

This month I'm going down the science fiction route and pretending that I'm writing about a new SQL Server feature in 2026.

Fully Automatic Tuning

Watch this week's video on YouTube

I was really excited when automatic tuning capabilities were first introduced in SQL Server 2017. I couldn't wait to say so-long to the days where I had to spend time fixing basic, repetitive query tuning problems.

And while those first versions of automatic plan choice corrections were fine, there was a lot left to be desired...

Fortunately, Microsoft has fully leveraged its built-in R and Python services to allow for advanced automatic tuning to make the life of SQL Server DBAs and developers that much easier.

On By Default

Perhaps the coolest part of these new automatic tuning capabilities is that they are on by default. What this means is that databases will seem to perform better right out of the box without any kind of intervention.

I think the fact that Microsoft is confident enough to enable this by default in the on-premise version of SQL Server shows how confident they are in the capabilities of these features.

Optimize For Memory and Data Skew

While the first iterations of automatic query tuning involved swapping out query plans when SQL Server found a regression in CPU performance, the new automatic plan correction is able to factor in many more elements.

For example, instead of optimizing for CPU usage, setting the new flag OPTIMIZE_FOR_MEMORY = ON allows SQL server to minimize memory usage instead.

Also, with the addition of the new "Optimized" cardinality estimator (so now we have "Legacy", "New", and "Optimized" - who's in charge of naming these things???) SQL Server is able to swap out different estimators at the query level in order to generate better execution plans!

What time is it?

Another new addition to automatic plan corrections is SQL Server's ability to choose an appropriate execution plan based on historical time-of-day server usage.

So if a query is executing during a lull period on the server, SQL Server is intelligent enough to realize this and choose a plan that is more resource intensive. This means faster query executions at the cost of a more intensive plan - but that's OK since the server isn't being fully utilized during those times anyway.

Making use of hardware sensors

As the world continues to include more data collecting sensors everywhere, SQL Server makes good use of these data points in 2026.

Tying into the server's CPU and motherboard temperature sensors, SQL Server is able to negotiate with the OS and hardware to allow for dynamic CPU overclocking based on server demands.

While this option is not turned on by default, enabling dynamic overclocking allows for SQL Server to give itself a CPU processing boost when necessary, and then dial back down to more stable levels once finished.

This obviously won't be a feature used by everybody, but users who are willing to trade off some stability for additional analytical processing performance will love this feature.

How I Stopped Worrying And Learned To Love Automatic Tuning

At the end of the day, we are our own worst enemies. Even with the latest and greatest AI technology, we are capable of writing queries so terrible that even the smartest machine learning algorithms can't grasp.

While SQL Server's automatic tuning features work wonderfully on the boring and mundane performance problems, there are still plenty of performance problems that it leaves for us to troubleshoot.

And I love that. Let the software optimize itself and maintain a "good enough" baseline while letting me play with the really fun performance problems.

I'm sure these features will continue to evolve - but so will we, working on new problems and facing new data challenges.

How To Make Your Queries Perform Like They Used To

Published Tue 19 September 2017 in SQL > Performance Tuning

chad-kirchoff-202730

Photo by Chad Kirchoff on Unsplash

Watch this week's video on YouTube

In the ideal world, you fully test how your SQL Server will handle upgrading to the latest version. You're able to catch and mitigate any performance surprises early on.

In the real world, you might not be able to perform a good test before upgrading. You only learn about poor performance once you upgrade in production.

Does this scenario sound familiar? It's what I've been thinking about recently as I have to deal with query performance issues after upgrading to SQL Server 2016.

Coming from SQL Server 2012, I know the new cardnality estimator added in 2014 is a major source of the problems I'm experiencing. While the new cardinality estimator improves performance of some queries, it has also made made some of my queries take hours to run instead of seconds.

Long-term, the solution is to revisit these queries, their stats, and the execution plans being generated for them to see what can be rewritten for better performance.

But ain't nobody got time for that (at least when facing performance crippling production issues).

Short-term, put a band-aid on to stop the bleeding.

I could change the compatibility mode of the database to revert back to SQL Server 2012 (before the new cardinality estimator was introduced), but then I miss out on being able to use all of the new SQL Server 2016 improvements just because a few queries are misbehaving.

I could enable trace flag 9481 to have my queries use the old cardinality estimator, however as a developer I probably don't have access to play with trace flags (and for good reason).

Starting with 2016 SP1, what I can do is use the legacy cardinality estimator query hint:

OPTION (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION'));

This hint is great because it doesn't require developers to have any special permissions. It also allows SQL to use the old cardinality estimator for poor performing queries only - the rest of the server gets to benefit from the improvements brought on by the new cardinality estimator.

With time, I can revisit the queries that are using this hint to try to improve their performance, but in the mean time it's a great way to fix regressed query performance due to the new cardinality estimator.

How to Safely Parameterize Table Names

Published Tue 05 September 2017 in SQL > Development > SQL Injection

Protecting against SQL Injection Part 2

83ccc-1o7lugdzcgtbtv_b1tbgfeq

Watch this week's video on YouTube

Last week we talked about building dynamic SQL queries and how doing so might leave you open to SQL injection attacks.

In that post we examined how using sp_executesql to parameterize our dynamic SQL statements protects us.

Today, we are going to look at where sp_executesql falls short.

The problem with sp_executesql

Let's modify last week's final query to make our table name dynamic:

CREATE PROCEDURE dbo.sp_GetFullNameFromTable
        @ParmTableName varchar(100),
        @ParmUserName varchar(100)
AS
BEGIN
    DECLARE @FullQuery nvarchar(1000)
    SET @FullQuery = N'SELECT FullName FROM dbo.@TableName WHERE UserName = @UserName'

    DECLARE @ParmDefinition nvarchar(100) = N'@TableName varchar(100), @UserName varchar(100)';  

    EXEC sp_executesql @FullQuery, @ParmDefinition,  
                      @UserName = @ParmUserName,
                      @TableName = @ParmTableName;  

END

(there's an entirely different discussion to be had about whether you should allow table and column names to be dynamic, but we'll assume we need a dynamic table name for the rest of this post)

If we pass in a table name parameter value and execute this query, we'll be greeted with this error:

9e36e-1qp0xtfmhlve-ydgruibeug

Yeah, sp_executesql doesn't like parameterizing a table names.

So how do we make table name dynamic and safe?

Unfortunately we have to fall back on SQL's EXEC command.

However, like we discussed last week, we need to be vigilant about what kind of user input we allow to be built as part of our query.

Assuming our app layer is already sanitizing as much of the user input as possible, here are some precautions we can take on the SQL side of the equation:

QUOTENAME()

If we wrap our user input parameters with QUOTENAME(), we'll achieve safety:

CREATE PROCEDURE dbo.sp_GetFullNameFromTableSanitized
    @ParmTableName varchar(100),
    @ParmUserName varchar(100)
AS
BEGIN
    DECLARE @FullQuery nvarchar(1000)
    SET @FullQuery = N'SELECT FullName FROM dbo.' + QUOTENAME(@ParmTableName) + ' WHERE UserName = @UserName'

    DECLARE @ParmDefinition nvarchar(100) = N'@UserName varchar(100)';  

    EXEC sp_executesql @FullQuery, @ParmDefinition,  
                      @UserName = @ParmUserName

END

This results in:

3fea5-1l1wvjyzffokk0wtaxkhhfg

Although QUOTENAME() works well by adding brackets (by default) to sanitize quotes, it's downside is that it will only output strings up to 128 characters long. If you are expecting parameters with values longer than that, you will have to use something like REPLACE(@TableName,'''','''''') instead to delimit single quotes (however, rolling your own logic like this is really hard to do securely and not recommended).

EXECUTE AS

The account running any dynamic SQL queries should be locked down so that it won't be able to perform any operations you don't want it to do.

Taking this idea one step further, you can create another account with very limited permissions, and add EXECUTE AS to your stored procedure to run that code under the very limited account.

CREATE PROCEDURE dbo.sp_GetFullNameFromTableSanitized
    @ParmTableName varchar(100),
    @ParmUserName varchar(100)
    WITH EXECUTE AS 'LimitedUser'
AS
BEGIN
...

This won't prevent injection, but it will limit what the malicious user is able to do.