Data with Bert logo

How NOLOCK Will Block Your Queries

k Photo by James Sutton on Unsplash

Note: the problem described below applies to all SELECT queries, not just those adorned with NOLOCK hints.  The fact that it applies to NOLOCK queries was a huge surprise to me though, hence the title.

Lots of people don't like NOLOCK (i.e. the read uncommitted isolation level) because it can return inaccurate data.  I've seen plenty of arguments cautioning developers from retrieving uncommitted reads because of how they can return dirty data, phantom reads, and non-repeatable reads.

I've known about all of those above problems, but there's one problem that I've never heard of until recently: NOLOCK can block other queries from running.

Watch this week's video on YouTube

Let's step back and understand why I've so often used NOLOCK in the past.  A fairly typical instance of when I use NOLOCK is when I want to let a query run overnight to return some large set of data.  I'm okay with some inconsistencies in the data (from dirty reads, etc...).  My primary concern is that I don't want the long running query to get in the way of other processes.

I always thought NOLOCK was a perfect solution for this scenario because it never locks the data that it reads - the results might not be perfect, but at least the query won't negatively impact any other process on the server.

This is where my understanding of NOLOCK was wrong: while NOLOCK won't lock row level data, it will take out a schema stability lock.

A schema stability (Sch-S) lock prevents the structure of a table from changing while the query is executing.  All SELECT statements, including those in the read uncommitted/NOLOCK isolation level, take out a Sch-S lock.  This makes sense because we wouldn't want to start reading data from a table and then have the column structure change half way through the data retrieval.

However, this also means there might be some operations that get blocked by a Sch-S lock.  For example, any command requesting a schema modification (Sch-M) lock gets blocked in this scenario.

What commands request Sch-M locks?

Things like an index REBUILD or sp_recompile table.  These are the types of commands running in my nightly maintenance jobs that I was trying to avoid hurting by using NOLOCK in the first place!

To reiterate, I used to think that using the NOLOCK hint was a great way to prevent blocking during long running queries.  However, it turns out that my NOLOCK queries were actually blocking my nightly index jobs (all SELECT queries block in this example, but I find the NOLOCK to be particularly misleading), which then caused other SELECT statements to get blocked too!

Let's take a look at this in action.  Here I have a query that creates a database, table, and then runs a long running query with NOLOCK:

DROP DATABASE IF EXISTS [Sandbox]
GO
CREATE DATABASE [Sandbox]
GO
USE [Sandbox]
GO

DROP TABLE IF EXISTS dbo.Test
CREATE TABLE dbo.Test
(
    c0 int IDENTITY PRIMARY KEY,
    c1 varchar(700) default REPLICATE('a',700)
)

CREATE NONCLUSTERED INDEX IX_Id ON dbo.Test (c1);
GO

INSERT INTO dbo.Test DEFAULT VALUES;
GO 1000


-- Read a billion records
SELECT * 
FROM 
    dbo.Test t1 (NOLOCK) 
    CROSS JOIN dbo.Test t2 (NOLOCK) 
    CROSS JOIN dbo.Test t3 (NOLOCK) 

Now, while that billion row read is occurring, we can verify that the query took out a Sch-S lock by looking at sys.dm_tran_locks:

SELECT *
FROM sys.dm_tran_locks
WHERE resource_type = 'OBJECT'

Sch-S lock granted

While that's running, if we try to rebuild an index, that rebuild is blocked (shown as a WAIT):

USE [Sandbox]
GO

ALTER INDEX IX_Id ON dbo.Test REBUILD

rebuild is blocked

Our index rebuild query will remain blocked until our billion row NOLOCK SELECT query finishes running (or is killed).  This means the query that I intended to be completely unobtrusive is now blocking my nightly index maintenance job from running.

Even worse, any other queries that try to run after the REBUILD query (or any other commands that request a Sch-M lock) are going to get blocked as well!  If I try to run a simple COUNT(*) query:

USE [Sandbox]
GO

SELECT COUNT(*) FROM dbo.Test

chained blocks

Blocked!  This means that not only is my initial NOLOCK query causing my index REBUILD maintenance jobs to wait, the Sch-M lock placed by the REBUILD maintenance job is causing any subsequent queries on that table to get blocked and be forced to wait as well.  I just derailed the timeliness of my maintenance job and subsequent queries with a blocking NOLOCK statement!

Solutions

Unfortunately this is a tough problem and there's no one-size-fits-all remedy.

Solution #1: Don't run long running queries

I could avoid running long queries at night when they might run into my index maintenance jobs.  This would prevent those index maintenance jobs and subsequent queries from getting delayed, but it means my initial billion row select query would then have to run earlier, negatively impacting server performance during a potentially busier time of day.

Solution #2: Use WAIT_AT_LOW_PRIORITY

Starting in 2014, I could do an online index rebuild with the WAIT_AT_LOW_PRIORITY option set:

ALTER INDEX IX_Id ON dbo.Test REBUILD 
WITH (ONLINE = ON (WAIT_AT_LOW_PRIORITY (MAX_DURATION = 1 MINUTES , ABORT_AFTER_WAIT = BLOCKERS)))

This query basically gives any blocking SELECT queries currently running 1 minute to finish executing or else this query will kill them and then execute the index rebuild.  Alternatively we could have also set ABORT_AFTER_WAIT = SELF and the rebuild query would kill itself, allowing the NOLOCK billion row SELECT to finish running and not preventing any other queries from running.

This is not a great solution because it means either the long running query gets killed or the index REBUILD gets killed.

Solution #3: REBUILD if no Sch-S, REORGANIZE otherwise

A programmatic solution can be written that tries to REBUILD the index, but falls back to REORGANIZE if it knows it will have to wait for a Sch-M lock.

I've created the boiler plate below as a starting point, but the sky is the limit with what you can do with it (e.g. create a WHILE loop to check for the lock every x seconds, create a timeout for when the script should stop trying to REBUILD and just REORGANIZE instead, etc...)

-- Idea for how to rebuild/reorganize based on a schema stability lock.
-- More of a starting point than fully functional code.
-- Not fully tested, you have been warned!
DECLARE 
    @TableName varchar(128) = 'Test',
    @HasSchemaStabilityLock bit = 0

SELECT TOP 1 @HasSchemaStabilityLock = 
    CASE WHEN l.request_mode IS NOT NULL THEN 1 ELSE 0 END
    FROM
        sys.dm_tran_locks as l
    WHERE
        l.resource_type = 'OBJECT'
        AND l.request_mode = 'Sch-S'
        AND l.request_type = 'LOCK'
        AND l.request_status = 'GRANT'
        AND OBJECT_NAME(l.resource_associated_entity_id) = @TableName

IF @HasSchemaStabilityLock = 0
BEGIN
    -- Perform a rebuild
    ALTER INDEX IX_Id ON dbo.Test REBUILD
    PRINT 'Index rebuilt'
END
ELSE
BEGIN
    -- Perform a REORG
    ALTER INDEX IX_Id ON dbo.Test REORGANIZE
    PRINT 'Index reorganized'
END

This solution is my favorite because:

  1. Ad hoc long running queries don't get killed (all of that time spent processing doesn't go to waste)
  2. Other select queries are not blocked by the Sch-M lock attempt by REBUILD
  3. Index maintenance still occurs, even if it ends up being a REORGANIZE instead of a REBUILD

3 Tips You Need To Know When Using PowerShell with SQL Server

Watch this week's video on YouTube

Have you ever had to perform repetitive tasks in SQL Server?

Maybe you've had to manually verify backups, script out all of a server's logins/groups/permissions, or refresh a dev environment with data.  With PowerShell, you can automate all of these manual tasks...and more!

This week I had the opportunity to interview PowerShell expert Drew Furgiuele and learn his three favorite tips for using PowerShell with SQL Server.

Whether you are just getting started with PowerShell or have already written some automation scripts, you'll want to be sure you are following Drew's advice.

So if you haven't already, go grab the SqlServer module and get busy scripting in PowerShell today!

Clustered vs Nonclustered: Index Fundamentals You Need To Know

patrick-tomasso-71909 Photo by Patrick Tomasso on Unsplash

How many times have you known that adding an index would improve query performance but you weren't exactly sure which type of index to add?

This happened to me all the time in my first few years (and maybe an extra year or two after that) of working with SQL Server.

Today I want to help alleviate some of that confusion by comparing two of the most common index types: clustered and nonclustered rowstore indexes.

Watch this week's video on YouTube

Clustered Indexes

Every table's data has some natural order to it.

If the order is random and not explicitly defined then that table is known as a heap.  With the exception of a few special cases, we generally don't want to have heaps.  Heaps don't perform well for the majority of queries becauase SQL Server has no meta knowledge about where data is stored within a heap.

If we don't have a random heap, that means we have defined the order that data should be stored in a table. ** The physical storage order of our data is defined by our clustered index.**

Every table can have exactly one clustered index because the data in a table can only be stored in one order i.e. you can't have that table's data physically stored on the disk in more than one order.

What are the benefits of a clustered index?

The data in a clustered index is stored in order.  That means:

  1. Finding the data you need in your clustered index is a matter of knowing where to look in our alphabetical list of data.  Computers are really good at doing this.
  2. If your data needs to be outputted in the same order that it's stored in - presto! - SQL doesn't need to do any additional sorting.

A lot of people like to put the clustered index on their table's primary key (PK).  This is usually fine because a lot of the time our primary key is likely to be our most used field for joins, where statements, etc...

Some people think they can ONLY put their clustered index on their PK.  That's not true! Often times it can be much more beneficial to put your clustered index on something that isn't your PK, like a different column that is getting more use than our PK.  For example, you might have an identity column set as your PK, but every query against your table is filtering and sorting on a datetime2 column.  Why store your table in PK order if you are always going to be filtering and returning data on that datetime2 column?  Put that clustered index on the datetime2 column!

The downside to having data stored in this order is that actions like inserts and updates take long because SQL has to put them into the correct sorted order of the table pages - it can't just quickly tack them onto the end.

Another major benefit of a clustered index is that we don't have to "include" any additional data in our index.  All of the data for our row exists right beside our indexed columns.  This is not necessarily true of other index types (see nonclustered indexes below).

Pretend our clustered index is like the white pages of a phone book (note to future SQL developers in 2030 who have no idea what a phonebook is: it's something that stores the names, addresses, and landline phone numbers in your area.  What's a landline?  Oh boy...)

The phone book stores every person's name in alphabetical order, making it easy to look up certain individuals.  Additionally, if we look someone up, we immediately have their address and phone number right their next to their name - no additional searching necessary!

This is a great feature of clustered indexes - if you ever need to retrieve many or all columns from your table, a clustered index will usually be efficient because once it finds the indexed value you are searching on, it doesn't need to go anywhere else to get the remaining data from that row.

Nonclustered Indexes

If a clustered index is like a phone book, a nonclustered index is like the index in the back of a chemistry text book.  The chemistry text book has some natural order to it ("Chapter 1: Matter, Chapter 2: Elements, Chapter 3: Compounds, etc...").  However, this order doesn't help us if we want to look up the location of something specific, like "noble gases".

So what do we do?  We go to the index in the back of the textbook which lists all topics in alphabetical order, making it easy to find the listing for "noble gases" and the page number they are discussed on.  Once we know the page number for noble gases from our index, we can flip to the correct page and get the data we need.

This book index represents our nonclustered index.  A nonclustered index contains the ordered data for the columns specified in that index, with pointers (book page numbers) that tell us where to go to find the rest of the data from that row (flip to the right book page).  That means unlike a clustered index where all data is always present, using a nonclustered index often is a two step process: find the value of interest in the index and then go look up the rest of that row's data from where it actually exists on disk.

What are the benefits of a nonclustered index?

We can have as many nonclustered indexes on our tables as we want (well, we max out at 999).  That's great! Create an index for every column!

Well, no, don't do that.  There's overhead in creating nonclustered indexes.  Essentially, every time you index some column(s), you are duplicating the unique values in those column(s) so that they can be stored in sorted order in your index.  We get speed and efficiency in our data lookups, but with the cost of losing disk space.  You need to test and see for each table and set of queries what the optimal number of indexes is.  Adding an additional index can absolutely destroy performance, so always test your changes!

Additionally, using a nonclustered index to find an indexed column's value is fast (SQL is just going through the ordered index data to find the value it needs - once again, something computers are really good at doing).  However, if you need other columns of data from the row that you just looked up, SQL is going to have to use those index pointers to go find the rest of that row data somewhere else on disk.  This can really add up and slow down performance.

If those additional lookups are hurting performance, what you can do is INCLUDE your nonindexed columns in your nonclustered index.  What this basically does is in addition to storing the sorted values of your indexed column(s), the index will also store whatever additional values you want to include as part of the index itself.  Once again, you'll probably get better performance because SQL won't have to go to somewhere else on disk to find the data it needs, but you lose storage space because you are creating duplicates of that data as part of your index.

Example Usage Scenarios

[Note: I want to clarify that the above definitions and below examples don't cover lots of corner cases (blob values, fragmentation, etc...).  I wanted this post to be a simple starting point when people don't know what index type they should try adding first, because this was the paralysis that I had when starting out.]{style="color: #808080;"}

[Every statement in this article can probably have an asterisk appended to the end of it, pointing out some example where a recommendation I wrote is 100% wrong.  ALWAYS test your index changes, because what might improve one query may hurt another one already running on that table, and over time you will learn about all of those edge cases and how they affect index performance.]{style="color: #808080;"}

Alright let's take a look at a few common scenarios and what the best index for them might be.  After reading each scenario, take a guess about what kind of index you would add and then click on the answer to reveal what I would do in that scenario.  Assume no indexes exist yet on these tables unless otherwise noted.

  • You have OLTP data that's used only for transactional reads and writing new rows. You know the primary key is an identity integer column.  What type of index would you create for the primary key?
    Clustered index - Your queries are probably always going to be looking up by PK to return data.  If you store the data in the table ordered by that PK, SQL will be able to do this very quickly.  New row additions to the table will always get put at the end because of the auto-incrementing identity column, not creating any overhead for having to insert data in a specific location in the ordered data.
  • You have a query that wants to return most or all of the columns from a table.  What type of index would make this the most efficient?
    Clustered index - Since all of the column values are stored in the same location as your indexed fields, SQL won't have to go do any additional lookups to get all of the data you are requesting from it.  If you created a nonclustered index you would have to INCLUDE all nonindexed columns, which would take up lots of space since you are essentially duplicating your entire table's data.
  • You have a table that is constantly having values updated.  These updated values are used as in your JOINs and WHERE clauses.  What type of index would you add? Nonclustered index - If our values are constantly changing, SQL only has to update the index and pointers while putting the actual data wherever it has available space on disk.  Compare this to a clustered index where it has to put the inserted/updated data in the correct order, meaning potentially lots of operations to shift the data around if available free space doesn't exist at that location.
  • You have a table that already has a clustered index, but it doesn't cover columns in JOINs and WHERE clauses.  What do you do? Nonclustered index - since the clustered index already exists, your only option is to add a nonclustered index.  Depending on the queries hitting this table however, you may want to consider changing your clustered index to a nonclustered index if you think your JOINs and WHERE clauses will be improved by having those fields be part of the clustered index.  Test it out!
  • You have a small staging table that you will always read all rows from and then truncate.  You don't care about the order.  Do you add an index? No, leave it as a heap - This is one scenario where not adding an index can give you better performance since there is no overhead in SQL having to store things in a sorted order or update indexes to specify the order.  If you truly don't care about the order, and you will always be reading all rows from a table and then truncating the table, then it's better not to have the overhead of having indexes on the table.

How To Make Your Queries Perform Like They Used To

chad-kirchoff-202730

Photo by Chad Kirchoff on Unsplash

Watch this week's video on YouTube

In the ideal world, you fully test how your SQL Server will handle upgrading to the latest version.  You're able to catch and mitigate any performance surprises early on.

In the real world, you might not be able to perform a good test before upgrading.  You only learn about poor performance once you upgrade in production.

Does this scenario sound familiar?  It's what I've been thinking about recently as I have to deal with query performance issues after upgrading to SQL Server 2016.

Coming from SQL Server 2012, I know the new cardnality estimator added in 2014 is a major source of the problems I'm experiencing.  While the new cardinality estimator improves performance of some queries, it has also made made some of my queries take hours to run instead of seconds.

Long-term, the solution is to revisit these queries, their stats, and the execution plans being generated for them to see what can be rewritten for better performance.

But ain't nobody got time for that (at least when facing performance crippling production issues).

Short-term, put a band-aid on to stop the bleeding.

I could change the compatibility mode of the database to revert back to SQL Server 2012 (before the new cardinality estimator was introduced), but then I miss out on being able to use all of the new SQL Server 2016 improvements just because a few queries are misbehaving.

I could enable trace flag 9481 to have my queries use the old cardinality estimator, however as a developer I probably don't have access to play with trace flags (and for good reason).

Starting with 2016 SP1, what I can do is use the legacy cardinality estimator query hint:

OPTION (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION'));

This hint is great because it doesn't require developers to have any special permissions.  It also allows SQL to use the old cardinality estimator for poor performing queries only - the rest of the server gets to benefit from the improvements brought on by the new cardinality estimator.

With time, I can revisit the queries that are using this hint to try to improve their performance, but in the mean time it's a great way to fix regressed query performance due to the new cardinality estimator.

ʼ;ŚℇℒℇℂƮ *: How Unicode Homoglyphs Can Thwart Your Database Security

niketh-vellanki-202943

For the past couple weeks I've been writing about how to protect your database from a SQL injection attack.  Today, we will keep the trend going by looking at how implicit unicode conversions can leave your data vulnerable.

Watch this week's video on YouTube

What's a homoglyph?

A homoglyph is a character that looks like another character. l  (lowercase "L") and 1  (the number) are considered homoglyphs.  So are O  (the letter) and 0  (the number).

Homoglpyhs can exist within a character set (like the Latin character set examples above) or they can exist between character sets.  For example, you may have the unicode apostrophe ʼ, which is a homoglyph to the Latin single quote character ' .

How does SQL Server handle unicode homoglyphs?

Funny you should ask.  If you pass in a unicode character to a non-unicode datatype (like char), SQL implicitly converts the unicode character to its closest resembling non-unicode homoglyph.

To see this in action, we can use the unicode apostrophe from the example above:

SELECT 
    CAST(N'ʼ' AS nchar) AS UnicodeChar, 
    CAST(N'ʼ' AS char) AS NonUnicodeChar

You can see in the second column SQL automatically converted the apostrophe to a single quote:

2017-09-07_20-23-49

Although this implicit character conversion can be convenient for when you want to display unicode characters in a non-unicode character set, it can spell disaster for your SQL Server security.

Unicode Homoglyph SQL Injection

If you are already using sp_executesql or QUOTENAME() when building your dynamic SQL queries then you are safe from this type of SQL injection.

I know you aren't the kind of person who would ever write your own security functions when solid, safe, and tested functions like the above are available.  However, just this one time let's pretend you think you can outsmart a hacker by writing your own quote delimiting code.

Using the same dataset as last week, let's create a new stored procedure that is going to return some data from a user's profile:

DROP PROCEDURE IF EXISTS dbo.GetProfile
GO
CREATE PROCEDURE dbo.GetProfile
    @Username nvarchar(100)
AS
BEGIN
    -- Add quotes to escape injection...or not?
    SET @Username = REPLACE(@Username, '''','''''')

    DECLARE @Query varchar(max)

    SET @Query = 'SELECT 
                    FullName, 
                    JoinDate
                FROM
                    dbo.RegisteredUser
                WHERE
                    UserName = ''' + @Username + '''
                    '

    EXEC(@Query)
END
GO

Instead of using sp_executesql or QUOTENAME(), let's try to write our own clever REPLACE() function that will replace single quotes with two sets of single quotes.  This should, in theory, prevent SQL injection.

If we test out a "normal" attempt at SQL injection, you'll notice this logic works great.  Give yourself a pat on the back!

2017-09-07_20-42-32

However, if we pass in a unicode apostrophe...:

2017-09-07_20-45-23

The reason this happens is because we declared our @Query parameter as varchar instead of the unicode nvarchar.  When we build our dynamic SQL statement, SQL implicitly converts the nvarchar @Username parameter to the non-unicode varchar:

2017-09-07_20-51-19

So if I replace apostrophes will that make me safe?

No.

I know it seems like black listing/replacing the unicode apostrophe would solve all of our problems.

And it would...in this scenario only.  There are more unicode homoglpyhs than just an apostrophe though.

Out of curiosity I wrote a script to search through the unicode character space to see what other homoglyphs exist:

DECLARE @FirstNumber INT=0;
-- number of possible characters in the unicode space
DECLARE @LastNumber INT=1114112;

WITH Numbers AS (
    SELECT @FirstNumber AS n
    UNION ALL
    SELECT n+1 FROM Numbers WHERE n+1<=@LastNumber
), UnicodeConversion AS (
SELECT
       n AS CharacterNumber,
       CASE CAST(NCHAR(n) as CHAR(1))
              WHEN '''' THEN NCHAR(n)
              WHEN ';' THEN NCHAR(n)
       END AS UnicodeCharacter,
       CAST(NCHAR(n) as CHAR(1)) AS ASCIICharacter
FROM Numbers
)
SELECT
       *
FROM
       UnicodeConversion
WHERE
       UnicodeCharacter IS NOT NULL
OPTION (MAXRECURSION 0)

2017-09-07_20-58-34

Although the characters in the above screen shot might look similar, they are actually homoglyphs.

I decided to only search for single quotes and semi-colons since they are frequently used in SQL injection attacks, but this by no means is an extensive list of all of the characters you would want to blacklist.

Not only would it be very difficult to confidently blacklist every dangerous homoglyph, but new characters are being added to unicode all of the time so maintaining a black list would be a maintenance nightmare.  Especially if the person maintaining this code in the future isn't familiar with these types of injection attacks.

And don't be cheeky thinking you can filter out dangerous SQL keywords either - even if you REPLACE(@Username,'SELECT',''), just remember someone can come by and pass in a value like 'ŚεℒℇℂƮ'.

Conclusion

Don't write your own security functions - they will fail.

Your best protection against SQL injection is to not use dynamic SQL.  If you have to use dynamic SQL, then use either sp_executesql and QUOTENAME().