Photo by Fischer Twins on Unsplash

You probably tune your queries for maximum performance.  You take pride in knowing how to add indexes and refactor code in order to squeeze out every last drop your server’s performance potential.  Speed is usually king.

That’s why you probably don’t use SQL Server’s WAITFOR command regularly – it actually makes your overall query run slower.

However, slowness isn’t always a bad thing.  Today I want to show you two of my favorite ways for using the WAITFOR command.

You can also watch this week’s content on my YouTube channel.

1. Building A Human

Modern day computers are fast.  CPUs perform billions of actions per second, the amount of RAM manufactures can cram onto a stick increases regularly, and SSDs are quickly making disk I/O concerns a thing of the past.

While all of those things are great for processing large workloads, they move computers further and further away from “human speed”.

But “human speed” is sometimes what you want.  Maybe you want to simulate app usage on your database or the load created by analysts running ad hoc queries against your server.

This is where I love using WAITFOR DELAY – it can simulate humans executing queries extremely welll:

Throw in some psuedo-random number generation and some IF statements, and you have a fake server load you can start using:

2. Poor Man’s Service Broker

Service Broker is a great feature in SQL Server.  It handles messaging and queuing scenarios really well, but requires more setup time so I usually don’t like using it in scenarios where I need something quick and dirty.

Instead of having to set up Service Broker to know when some data is available or a process is ready to be kicked off, I can do the same with a WHILE loop and a WAITFOR:

Fancy? No.  Practical? Yes.

No longer do I need to keep checking a table for results before I run a query – I can have WAITFOR do that for me.

If you know there is a specific time you want to wait for until you start pinging some process, you can incorporate WAITFOR TIME to make your checking even more intelligent:

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

lock
Photo by James Sutton on Unsplash

Note: the problem described below applies to all SELECT queries, not just those adorned with NOLOCK hints.  The fact that it applies to NOLOCK queries was a huge surprise to me though, hence the title.

Lots of people don’t like NOLOCK (i.e. the read uncommitted isolation level) because it can return inaccurate data.  I’ve seen plenty of arguments cautioning developers from retrieving uncommitted reads because of how they can return dirty data, phantom reads, and non-repeatable reads.

I’ve known about all of those above problems, but there’s one problem that I’ve never heard of until recently: NOLOCK can block other queries from running.

Watch this week’s post on YouTube

Let’s step back and understand why I’ve so often used NOLOCK in the past.  A fairly typical instance of when I use NOLOCK is when I want to let a query run overnight to return some large set of data.  I’m okay with some inconsistencies in the data (from dirty reads, etc…).  My primary concern is that I don’t want the long running query to get in the way of other processes.

I always thought NOLOCK was a perfect solution for this scenario because it never locks the data that it reads – the results might not be perfect, but at least the query won’t negatively impact any other process on the server.

This is where my understanding of NOLOCK was wrong: while NOLOCK won’t lock row level data, it will take out a schema stability lock.

A schema stability (Sch-S) lock prevents the structure of a table from changing while the query is executing.  All SELECT statements, including those in the read uncommitted/NOLOCK isolation level, take out a Sch-S lock.  This makes sense because we wouldn’t want to start reading data from a table and then have the column structure change half way through the data retrieval.

However, this also means there might be some operations that get blocked by a Sch-S lock.  For example, any command requesting a schema modification (Sch-M) lock gets blocked in this scenario.

What commands request Sch-M locks?

Things like an index REBUILD or sp_recompile table.  These are the types of commands running in my nightly maintenance jobs that I was trying to avoid hurting by using NOLOCK in the first place!

To reiterate, I used to think that using the NOLOCK hint was a great way to prevent blocking during long running queries.  However, it turns out that my NOLOCK queries were actually blocking my nightly index jobs (all SELECT queries block in this example, but I find the NOLOCK to be particularly misleading), which then caused other SELECT statements to get blocked too!

Let’s take a look at this in action.  Here I have a query that creates a database, table, and then runs a long running query with NOLOCK:

Now, while that billion row read is occurring, we can verify that the query took out a Sch-S lock by looking at sys.dm_tran_locks:

Sch-S lock granted

While that’s running, if we try to rebuild an index, that rebuild is blocked (shown as a WAIT):

rebuild is blocked

Our index rebuild query will remain blocked until our billion row NOLOCK SELECT query finishes running (or is killed).  This means the query that I intended to be completely unobtrusive is now blocking my nightly index maintenance job from running.

Even worse, any other queries that try to run after the REBUILD query (or any other commands that request a Sch-M lock) are going to get blocked as well!  If I try to run a simple COUNT(*) query:

chained blocks

Blocked!  This means that not only is my initial NOLOCK query causing my index REBUILD maintenance jobs to wait, the Sch-M lock placed by the REBUILD maintenance job is causing any subsequent queries on that table to get blocked and be forced to wait as well.  I just derailed the timeliness of my maintenance job and subsequent queries with a blocking NOLOCK statement!

Solutions

Unfortunately this is a tough problem and there’s no one-size-fits-all remedy.

Solution #1: Don’t run long running queries

I could avoid running long queries at night when they might run into my index maintenance jobs.  This would prevent those index maintenance jobs and subsequent queries from getting delayed, but it means my initial billion row select query would then have to run earlier, negatively impacting server performance during a potentially busier time of day.

Solution #2: Use WAIT_AT_LOW_PRIORITY

Starting in 2014, I could do an online index rebuild with the WAIT_AT_LOW_PRIORITY option set:

This query basically gives any blocking SELECT queries currently running 1 minute to finish executing or else this query will kill them and then execute the index rebuild.  Alternatively we could have also set ABORT_AFTER_WAIT = SELF and the rebuild query would kill itself, allowing the NOLOCK billion row SELECT to finish running and not preventing any other queries from running.

This is not a great solution because it means either the long running query gets killed or the index REBUILD gets killed.

Solution #3: REBUILD if no Sch-S, REORGANIZE otherwise

A programmatic solution can be written that tries to REBUILD the index, but falls back to REORGANIZE if it knows it will have to wait for a Sch-M lock.

I’ve created the boiler plate below as a starting point, but the sky is the limit with what you can do with it (e.g. create a WHILE loop to check for the lock every x seconds, create a timeout for when the script should stop trying to REBUILD and just REORGANIZE instead, etc…)

This solution is my favorite because:

  1. Ad hoc long running queries don’t get killed (all of that time spent processing doesn’t go to waste)
  2. Other select queries are not blocked by the Sch-M lock attempt by REBUILD
  3. Index maintenance still occurs, even if it ends up being a REORGANIZE instead of a REBUILD

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Photo by Patrick Tomasso on Unsplash

How many times have you known that adding an index would improve query performance but you weren’t exactly sure which type of index to add?

This happened to me all the time in my first few years (and maybe an extra year or two after that) of working with SQL Server.

Today I want to help alleviate some of that confusion by comparing two of the most common index types: clustered and nonclustered rowstore indexes.

Watch the this post on YouTube, or continue reading below if that’s more your style.

Clustered Indexes

Every table’s data has some natural order to it.

If the order is random and not explicitly defined then that table is known as a heap.  With the exception of a few special cases, we generally don’t want to have heaps.  Heaps don’t perform well for the majority of queries becauase SQL Server has no meta knowledge about where data is stored within a heap.

If we don’t have a random heap, that means we have defined the order that data should be stored in a table.  The physical storage order of our data is defined by our clustered index.

Every table can have exactly one clustered index because the data in a table can only be stored in one order i.e. you can’t have that table’s data physically stored on the disk in more than one order.

What are the benefits of a clustered index?

The data in a clustered index is stored in order.  That means:

  1. Finding the data you need in your clustered index is a matter of knowing where to look in our alphabetical list of data.  Computers are really good at doing this.
  2. If your data needs to be outputted in the same order that it’s stored in – presto! – SQL doesn’t need to do any additional sorting.

A lot of people like to put the clustered index on their table’s primary key (PK).  This is usually fine because a lot of the time our primary key is likely to be our most used field for joins, where statements, etc…

Some people think they can ONLY put their clustered index on their PK.  That’s not true! Often times it can be much more beneficial to put your clustered index on something that isn’t your PK, like a different column that is getting more use than our PK.  For example, you might have an identity column set as your PK, but every query against your table is filtering and sorting on a datetime2 column.  Why store your table in PK order if you are always going to be filtering and returning data on that datetime2 column?  Put that clustered index on the datetime2 column!

The downside to having data stored in this order is that actions like inserts and updates take long because SQL has to put them into the correct sorted order of the table pages – it can’t just quickly tack them onto the end.

Another major benefit of a clustered index is that we don’t have to “include” any additional data in our index.  All of the data for our row exists right beside our indexed columns.  This is not necessarily true of other index types (see nonclustered indexes below).

Pretend our clustered index is like the white pages of a phone book (note to future SQL developers in 2030 who have no idea what a phonebook is: it’s something that stores the names, addresses, and landline phone numbers in your area.  What’s a landline?  Oh boy…)

The phone book stores every person’s name in alphabetical order, making it easy to look up certain individuals.  Additionally, if we look someone up, we immediately have their address and phone number right their next to their name – no additional searching necessary!

This is a great feature of clustered indexes – if you ever need to retrieve many or all columns from your table, a clustered index will usually be efficient because once it finds the indexed value you are searching on, it doesn’t need to go anywhere else to get the remaining data from that row.

Nonclustered Indexes

If a clustered index is like a phone book, a nonclustered index is like the index in the back of a chemistry text book.  The chemistry text book has some natural order to it (“Chapter 1: Matter, Chapter 2: Elements, Chapter 3: Compounds, etc…”).  However, this order doesn’t help us if we want to look up the location of something specific, like “noble gases”.

So what do we do?  We go to the index in the back of the textbook which lists all topics in alphabetical order, making it easy to find the listing for “noble gases” and the page number they are discussed on.  Once we know the page number for noble gases from our index, we can flip to the correct page and get the data we need.

This book index represents our nonclustered index.  A nonclustered index contains the ordered data for the columns specified in that index, with pointers (book page numbers) that tell us where to go to find the rest of the data from that row (flip to the right book page).  That means unlike a clustered index where all data is always present, using a nonclustered index often is a two step process: find the value of interest in the index and then go look up the rest of that row’s data from where it actually exists on disk.

What are the benefits of a nonclustered index?

We can have as many nonclustered indexes on our tables as we want (well, we max out at 999).  That’s great! Create an index for every column!

Well, no, don’t do that.  There’s overhead in creating nonclustered indexes.  Essentially, every time you index some column(s), you are duplicating the unique values in those column(s) so that they can be stored in sorted order in your index.  We get speed and efficiency in our data lookups, but with the cost of losing disk space.  You need to test and see for each table and set of queries what the optimal number of indexes is.  Adding an additional index can absolutely destroy performance, so always test your changes!

Additionally, using a nonclustered index to find an indexed column’s value is fast (SQL is just going through the ordered index data to find the value it needs – once again, something computers are really good at doing).  However, if you need other columns of data from the row that you just looked up, SQL is going to have to use those index pointers to go find the rest of that row data somewhere else on disk.  This can really add up and slow down performance.

If those additional lookups are hurting performance, what you can do is INCLUDE your nonindexed columns in your nonclustered index.  What this basically does is in addition to storing the sorted values of your indexed column(s), the index will also store whatever additional values you want to include as part of the index itself.  Once again, you’ll probably get better performance because SQL won’t have to go to somewhere else on disk to find the data it needs, but you lose storage space because you are creating duplicates of that data as part of your index.

Example Usage Scenarios

Note: I want to clarify that the above definitions and below examples don’t cover lots of corner cases (blob values, fragmentation, etc…).  I wanted this post to be a simple starting point when people don’t know what index type they should try adding first, because this was the paralysis that I had when starting out.

Every statement in this article can probably have an asterisk appended to the end of it, pointing out some example where a recommendation I wrote is 100% wrong.  ALWAYS test your index changes, because what might improve one query may hurt another one already running on that table, and over time you will learn about all of those edge cases and how they affect index performance.

Alright let’s take a look at a few common scenarios and what the best index for them might be.  After reading each scenario, take a guess about what kind of index you would add and then click on the answer to reveal what I would do in that scenario.  Assume no indexes exist yet on these tables unless otherwise noted.

  • You have OLTP data that’s used only for transactional reads and writing new rows. You know the primary key is an identity integer column.  What type of index would you create for the primary key?
    Answer

    Clustered index – Your queries are probably always going to be looking up by PK to return data.  If you store the data in the table ordered by that PK, SQL will be able to do this very quickly.  New row additions to the table will always get put at the end because of the auto-incrementing identity column, not creating any overhead for having to insert data in a specific location in the ordered data.

    [collapse]
  • You have a query that wants to return most or all of the columns from a table.  What type of index would make this the most efficient?
    Answer

    Clustered index – Since all of the column values are stored in the same location as your indexed fields, SQL won’t have to go do any additional lookups to get all of the data you are requesting from it.  If you created a nonclustered index you would have to INCLUDE all nonindexed columns, which would take up lots of space since you are essentially duplicating your entire table’s data.

    [collapse]
  • You have a table that is constantly having values updated.  These updated values are used as in your JOINs and WHERE clauses.  What type of index would you add?
    Answer

    Nonclustered index – If our values are constantly changing, SQL only has to update the index and pointers while putting the actual data wherever it has available space on disk.  Compare this to a clustered index where it has to put the inserted/updated data in the correct order, meaning potentially lots of operations to shift the data around if available free space doesn’t exist at that location.

    [collapse]
  • You have a table that already has a clustered index, but it doesn’t cover columns in JOINs and WHERE clauses.  What do you do?
    Answer

    Nonclustered index – since the clustered index already exists, your only option is to add a nonclustered index.  Depending on the queries hitting this table however, you may want to consider changing your clustered index to a nonclustered index if you think your JOINs and WHERE clauses will be improved by having those fields be part of the clustered index.  Test it out!

    [collapse]
  • You have a small staging table that you will always read all rows from and then truncate.  You don’t care about the order.  Do you add an index?
    Answer

    No, leave it as a heap – This is one scenario where not adding an index can give you better performance since there is no overhead in SQL having to store things in a sorted order or update indexes to specify the order.  If you truly don’t care about the order, and you will always be reading all rows from a table and then truncating the table, then it’s better not to have the overhead of having indexes on the table.

    [collapse]

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Photo by Niketh Vellanki on Unsplash

For the past couple weeks I’ve been writing about how to protect your database from a SQL injection attack.  Today, we will keep the trend going by looking at how implicit unicode conversions can leave your data vulnerable.

You can also watch this content on my YouTube channel.

What’s a homoglyph?

A homoglyph is a character that looks like another character. l  (lowercase “L”) and 1  (the number) are considered homoglyphs.  So are O  (the letter) and 0  (the number).

Homoglpyhs can exist within a character set (like the Latin character set examples above) or they can exist between character sets.  For example, you may have the unicode apostrophe  ʼ, which is a homoglyph to the Latin single quote character ' .

How does SQL Server handle unicode homoglyphs?

Funny you should ask.  If you pass in a unicode character to a non-unicode datatype (like char), SQL implicitly converts the unicode character to its closest resembling non-unicode homoglyph.

To see this in action, we can use the unicode apostrophe from the example above:

You can see in the second column SQL automatically converted the apostrophe to a single quote:

Although this implicit character conversion can be convenient for when you want to display unicode characters in a non-unicode character set, it can spell disaster for your SQL Server security.

Unicode Homoglyph SQL Injection

If you are already using sp_executesql or QUOTENAME() when building your dynamic SQL queries then you are safe from this type of SQL injection.

I know you aren’t the kind of person who would ever write your own security functions when solid, safe, and tested functions like the above are available.  However, just this one time let’s pretend you think you can outsmart a hacker by writing your own quote delimiting code.

Using the same dataset as last week, let’s create a new stored procedure that is going to return some data from a user’s profile:

Instead of using sp_executesql or QUOTENAME(), let’s try to write our own clever REPLACE() function that will replace single quotes with two sets of single quotes.  This should, in theory, prevent SQL injection.

If we test out a “normal” attempt at SQL injection, you’ll notice this logic works great.  Give yourself a pat on the back!

However, if we pass in a unicode apostrophe…:

The reason this happens is because we declared our @Query parameter as varchar instead of the unicode nvarchar.  When we build our dynamic SQL statement, SQL implicitly converts the nvarchar @Username parameter to the non-unicode varchar:

So if I replace apostrophes will that make me safe?

No.

I know it seems like black listing/replacing the unicode apostrophe would solve all of our problems.

And it would…in this scenario only.  There are more unicode homoglpyhs than just an apostrophe though.

Out of curiosity I wrote a script to search through the unicode character space to see what other homoglyphs exist:

Although the characters in the above screen shot might look similar, they are actually homoglyphs.

I decided to only search for single quotes and semi-colons since they are frequently used in SQL injection attacks, but this by no means is an extensive list of all of the characters you would want to blacklist.

Not only would it be very difficult to confidently blacklist every dangerous homoglyph, but new characters are being added to unicode all of the time so maintaining a black list would be a maintenance nightmare.  Especially if the person maintaining this code in the future isn’t familiar with these types of injection attacks.

And don’t be cheeky thinking you can filter out dangerous SQL keywords either – even if you REPLACE(@Username,’SELECT’,”), just remember someone can come by and pass in a value like ‘ŚεℒℇℂƮ’.

Conclusion

Don’t write your own security functions – they will fail.

Your best protection against SQL injection is to not use dynamic SQL.  If you have to use dynamic SQL, then use either sp_executesql and QUOTENAME().

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Protecting against SQL Injection Part 2

Photo by Igor Ovsyannykov on Unsplash

Watch this content on YouTube.

Last week we talked about building dynamic SQL queries and how doing so might leave you open to SQL injection attacks.

In that post we examined how using sp_executesql to parameterize our dynamic SQL statements protects us.

Today, we are going to look at where sp_executesql falls short.

The problem with sp_executesql

Let’s modify last week’s final query to make our table name dynamic:

(there’s an entirely different discussion to be had about whether you should allow table and column names to be dynamic, but we’ll assume we need a dynamic table name for the rest of this post)

If we pass in a table name parameter value and execute this query, we’ll be greeted with this error:

Yeah, sp_executesql doesn’t like parameterizing a table names.

So how do we make table name dynamic and safe?

Unfortunately we have to fall back on SQL’s EXEC command.

However, like we discussed last week, we need to be vigilant about what kind of user input we allow to be built as part of our query.

Assuming our app layer is already sanitizing as much of the user input as possible, here are some precautions we can take on the SQL side of the equation:

QUOTENAME()

If we wrap our user input parameters with QUOTENAME(), we’ll achieve safety:

This results in:

Although QUOTENAME() works well by adding brackets (by default) to sanitize quotes, it’s downside is that it will only output strings up to 128 characters long. If you are expecting parameters with values longer than that, you will have to use something like REPLACE(@TableName,'''','''''') instead to delimit single quotes (however, rolling your own logic like this is really hard to do securely and not recommended).

EXECUTE AS

The account running any dynamic SQL queries should be locked down so that it won’t be able to perform any operations you don’t want it to do.

Taking this idea one step further, you can create another account with very limited permissions, and add EXECUTE AS to your stored procedure to run that code under the very limited account.

This won’t prevent injection, but it will limit what the malicious user is able to do.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Medicine 1” by marosh is licensed under CC BY-NC-ND 2.0.

Looking for a script to find possible SQL injection vulnerabilities on your server? Scroll to the bottom of this post.

Watch this content on YouTube.


OWASP names SQL injection as the #1 vulnerability for web applications. The infamy of this attack has even made its way into the popular XKCD comic.

What is SQL injection?

A SQL query is vulnerable to SQL injection if a user can run a query other than the one that was originally intended.

Sometimes SQL injection is easier to understand with an example. Let’s use this table of registered users:

And then let’s create a simple stored procedure that will query that table:

The important thing to note in the query above is that we are generating a dynamic SQL statement; that is, we are building the SQL query string, and then we are executing it.

Imagine this stored procedure is running in order to display a “Welcome <Full Name>!” message in our app — a website visitor types in their@ParmUserName and we execute the stored procedure to return their full name.

Here’s our code that calls the stored procedure:

And result:

Cool. No problems so far.

However, what if our user decides to pass in the following value as their username?

This funny looking parameter value returns this:

AHHHHHH!!!!

This user just hacked our website and viewed all of the users in our table.

In this specific case only our user’s full names were breached, but in other instances it’s just as easy for more sensitive data like passwords, social security numbers, and bank account numbers to be revealed as well (If you are looking for some fun, search for “SQL injection” on the Have I been pwned? list of Pwned websites to see all of the companies that aren’t doing a good job protecting your privacy).

So how did that example of SQL injection actually work?

Since our stored procedure executes a dynamically generated query, let’s look at what that generated query actually looks like for both of the parameters that were passed in:

Even though TFly37'' or 1=1-- doesn’t look like a intelligible input parameter, when its concatenated into our query it makes sense.

Our malicious user is basically writing their own SQL query, one that will return all of the names of our users instead of just their own. In many instances, the crafty hacker can go a step further and write additional injection code to reveal the contents of the entire user table (or other tables in the database)!

How do I prevent SQL injection?

Simple: don’t concatenate unsanitized user input data to your SQL queries.

In this example, this is easy to do: simply rewrite the stored procedure to not use dynamic SQL:

When you don’t have dynamic SQL, you can’t be affected by SQL injection attempts.

Avoiding dynamic SQL isn’t always possible though, so what can we do in those cases? Use sp_executesql:

sp_executesql allows for parameterization of your dynamic SQL, removing the possibility of unwanted code injection.

Are any of my current SQL queries vulnerable to SQL injection attacks?

So SQL injection is really bad and you don’t want to become like Sony or Yahoo. How do we check to see if we have any vulnerable queries on our server?

I wrote the query below to help you start searching. It is not perfect, but it does act as a good starting point for auditing potentially injectable queries:

I’ve found this script useful for myself, however if you find any issues with it please let me know, thanks!

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Unexpected SQL Server Performance Killers #3

Photo by Cibi Chakravarthi on Unsplash

In this series I explore scenarios that hurt SQL Server performance and show you how to avoid them. Pulled from my collection of “things I didn’t know I was doing wrong for years.”

Watch this post on YouTube.


Looking for a script to find non-SARGable queries on your server? Scroll to the bottom of this post.

What is a “SARGable” query?

Just because you add an index to your table doesn’t mean you get immediate performance improvement. A query running against that table needs to be written in such a way that it actually takes advantage of that index.

SARGable, or “Search Argument-able”, queries therefore are queries that are capable of utilizing indexes.

Examples please!

Okay let’s see some examples of SARGable and non-SARGable queries using my favorite beverage data.

There are non-clustered indexes on the Name and CreateDate columns

First, let’s look at a non-SARGable query:

Although this query correctly filters our rows to a specific date, it does so with this lousy execution plan:

SQL Server has to perform an Index Scan, or in other words has to check every single page of this index, to find our ‘2017–08–19’ date value.

SQL Server does this because it can’t immediately look at the value in the index and see if it is equal to the ‘2017–08–19’ date we supplied — we told it first to convert every value in our column/index to a CHAR(10) date string so that it can be compared as a string.

Since the SQL Server has to first convert every single date in our column/index to a CHAR(10) string, that means it ends up having to read every single page of our index to do so.

The better option here would be to leave the column/index value as a datetime2 datatype and instead convert the right hand of the operand to a datetime2:

Alternatively, SQL Server is smart enough to do this conversion implicitly for us if we just leave our ‘2017–08–19’ date as a string:

In this scenario SQL gives us an Index Seek because it doesn’t have to modify any values in the column/index in order to be able to compare it to the datetime2 value that ‘2017–08–19’ got converted to.

This means SQL only has to read what it needs to output to the results. Much more efficient.

One more example

Based on the last example we can assume that any function, explicit or implicit, that is running on the column side of an operator will result in a query that cannot make use of index seeks, making it non-SARGable.

That means that instead of doing something non-SARGable like this:

We want to make it SARGable by doing this instead:

In short, keep in mind whether SQL Server will have to modify the data in a column/index in order to compare it — if it does, your query probably isn’t SARGable and you are going to end up scanning instead of seeking.

OK, non-SARGable queries are bad…how do I check if I have any on my server?

The script below looks at cached query plans and searches them for any table or index scans. Next, it looks for scalar operators, and if it finds any it means we have ourselves a non-SARGable query. The fix is then to rewrite the query to be SARGable or add a missing index.

 

I’ve found this script useful for myself, but if you find any issues with it please let me know, thanks!

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Unexpected SQL Server Performance Killers #2

Photo by Jakob Owens on Unsplash

In this series I explore scenarios that hurt SQL Server performance and show you how to avoid them. Pulled from my collection of “things I didn’t know I was doing wrong for years.”

Prefer video? Watch this post on YouTube.

Last week we discussed how implicit conversions could be one reason why your meticulously designed indexes aren’t getting used.

Today let’s look at another reason: parameter sniffing.

Here’s the key: Parameter sniffing isn’t always a bad thing.

Most of the time it’s good: it means SQL Server is caching and reusing query plans to make your queries run faster.

Parameter sniffing only becomes a problem when the cached plan isn’t anywhere close to being the optimal plan for given input parameters.

So what’s parameter sniffing?

Let’s start with our table dbo.CoffeeInventory which you can grab from Github.

The key things to know about this table are that:

  1. We have a nonclustered index on our Name column.
  2. The data is not distributed evenly (we’ll see this in a minute)

Now, let’s write a stored procedure that will return a filtered list of coffees in our table, based on the country. Since there is no specific Country column, we’ll write it so it filters on the Name column:

Let’s take a look at parameter sniffing in action, then we’ll take a look at why it happens and how to solve it.

Running the above statement gives us identical execution plans using table scans:

In this case we explicitly specified the parameter @ParmCountry. Sometimes SQL will parameterize simple queries on its own.

That’s weird. We have two query executions, they are using the same plan, and neither plan is using our nonclustered index on Name!

Let’s step back and try again. First, clear the query plan cache for this stored procedure:

Next, execute the same stored procedure with the same parameter values, but this time with the ‘Ethiopia’ parameter value first. Look at the execution plan:

Now our nonclustered index on Name is being utilized. Both queries are still receiving the same (albeit different) plan.

We didn’t change anything with our stored procedure code, only the order that we executed the query with different parameters.

What the heck is going on here!?

This is an example of parameter sniffing. The first time a stored procedure (or query) is ran on SQL server, SQL will generate an execution plan for it and store that plan in the query plan cache:

All subsequent executions of that same query will go to the query cache to reuse that same initial query plan — this saves SQL Server time from having to regenerate a new query plan.

Note: A query with different values passed as parameters still counts as the “same query” in the eyes of SQL Server.

In the case of the examples above, the first time the query was executed was with the parameter for “Costa Rica”. Remember when I said this dataset was heavily skewed? Let’s look at some counts:

“Costa Rica” has more than 10,000 rows in this table, while all other country names are in the single digits.

This means that when we executed our stored procedure for the first time, SQL Server generated an execution plan that used a table scan because it thought this would be the most efficient way to retrieve 10,003 of the 10,052 rows.

This table scan query plan is only optimal for Costa Rica . Passing in any other country name into the stored procedure would return only a handful of records, making it more efficient for SQL Server to use our nonclustered index.

However, since the Costa Rica plan was the first one to run, and therefore is the one that got added to the query plan cache, all other executions ended up using the same table scan execution plan.

After clearing our cached execution plan using DBCC FREEPROCCACHE, we executed our stored procedure again but with ‘Ethiopia’ as our parameter. SQL Server determined that a plan with an index seek is optimal to retrieve only 6 of the 10,052 rows in the table. It then cached that Index Seek plan, which is why the second time around the ‘Costa Rica’ parameter received the execution plan with Index Seek.

Ok, so how do I prevent parameter sniffing?

This question should really be rephrased as “how do I prevent SQL Server from using a sub-optimal plan from the query plan cache?”

Let’s take a look at some of the techniques.

1. Use WITH RECOMPILE or OPTION (RECOMPILE)

We can simply add these query hints to either our EXEC statement:

or to our stored procedure itself:

What the RECOMPILE hint does is force SQL Server to generate a new execution plan every time these queries run.

Using RECOMPILE eliminates our parameter sniffing problem because SQL Server will regenerate the query plan every single time we execute the query.

The disadvantage here is that we lose all benefit from having SQL Server save CPU cycles by caching execution plans.

If your parameter sniffed query is getting ran frequently, RECOMPILE is probably a bad idea because you will encounter a lot of overheard to generate the query plan regularly.

If your parameter sniffed query doesn’t get ran often, or if the query doesn’t run often enough to stay in the query plan cache anyway, then RECOMPILE is a good solution.

2. Use the OPTIMIZE FOR query hint

Another option we have is to add either one of the following hints to our query. One of these would get added to the same location as OPTION (RECOMPILE) did in the above stored procedure:

or

OPTIMIZE FOR UNKNOWN will use a query plan that’s generated from the average distribution stats for that column/index. Often times it results in an average or bad execution plan so I don’t like using it.

OPTIMIZE FOR VALUE creates a plan using whatever parameter value specified. This is great if you know your queries will be retrieving data that’s optimized for the value you specified most of the time.

In our examples above, if we know the value ‘Costa Rica’ is rarely queried, we might optimize for index seeks. Most queries will then run the optimal cached query plan and we’ll only take a hit when ‘Costa Rica’ is queried.

3. IF/ELSE

This solution allows for ultimate flexibility. Basically, you create different stored procedures that are optimized for different values. Those stored procedures have their plans cached, and then an IF/ELSE statement determines which procedure to run for a passed in parameter:

This option is more work (How do you determine what the IF condition should be? What happens more data is added to the table over time and the distribution of data changes?) but will give you the best performance if you want your plans to be cached and be optimal for the data getting passed in.

Conclusion

  1. Parameter sniffing is only bad when your data values are unevenly distributed and cached query plans are not optimal for all values.
  2. SQL Server caches the query plan that is generated from the first run of a query/stored procedure with whatever parameter values were used during that first run.
  3. Using the RECOMPILE hint is a good solution when your queries aren’t getting ran often or aren’t staying in the the query cache most of the time anyway.
  4. The OPTIMIZE FOR hint is good to use when you can specify a value that will generate a query plan that is efficient for most parameter values and are OK with taking a hit for a sub-optimal plan on infrequently queried values.
  5. Using complex logic (like IF/ELSE) will give you ultimate flexibility and performance, but will also be the worst for long term maintenance.

 

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Unexpected SQL Server Performance Killers #1

In this series I explore scenarios that hurt SQL Server performance and show you how to avoid them. Pulled from my collection of “things I didn’t know I was doing wrong for years.”


Have you ever encountered a query that runs slowly, even though you’ve created indexes for it?

There’s a few different reasons why this may happen. The one I see most frequently happens in the following scenario.

I’ll have an espresso please

Let’s say I have a table dbo.CoffeeInventory of coffee beans and prices that I pull from my favorite green coffee bean supplier each week. It looks something like this:

If you want to follow along, you can get this data set from this GitHub Gist

I want to be able to efficiently query this table and filter on price, so next I create an index like so:

Now, I can write my query to find out what coffee prices are below my willingness to pay:

You would expect this query to be blazing fast and use a clustered index seek, right?

WRONG!

What the heck?

Why is SQL scanning the table when I added a clustered index on the column that I am filtering in my predicate? That’s not how it’s supposed to work!

Well dear reader, if we look a little bit closer at the table scan operation, we’ll notice a little something called CONVERT_IMPLICIT:

CONVERT_IMPLICIT: ruiner of fast queries

What is CONVERT_IMPLICIT doing? Well as it implies, it’s having to convert some data as it executes the query (as opposed to me having specified an explicit CAST() or CONVERT() function in my query).

The reason it needs to do this is because I defined my Price column as a VARCHAR(5):

Who put numeric data into a string datatype? Someone who hasn’t had their coffee yet today.

In my query however, I’m doing a comparison against a number WHERE Price < 6.75. SQL Server is saying it doesn’t know how to compare a string to a number, so it has to convert the VARCHAR string to a NUMERIC(3,2).

This is painful.

Why? Because SQL is performing that implicit conversion to the numeric datatype for every single row in my table. Hence, it can’t seek using the index because it ends up having to scan the whole table to convert every record to a number first.

And this doesn’t only happen with numbers and string conversion. Microsoft has posted an entire chart detailing what types of data type comparisons will force an implicit conversion:

https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-type-conversion-database-engine

That’s a lot of orange circles/implicit conversions!

How can I query my coffee faster?

Well in this scenario, we have two options.

  1. Fix the datatype of our table to align with the data actually being stored in this (data stewards love this).
  2. Not cause SQL Server to convert every row in the column.

Number 1 above is self-explanatory, and the better option if you can do it. However, if you aren’t able to modify the column type, you are better off writing your query like this:

Since we do a comparison of equivalent datatypes, SQL Server doesn’t need to do any conversions and our index gets used. Woo-hoo!

What about the rest of my server?

Remember that chart above? There are a lot of different data comparisons that can force a painful column side implicit conversion by SQL Server.

Fortunately, Jonathan Kehayias has written a great query that helps you find column side implicit conversions by querying the plan cache. Running his query is a great way to identify most of the implicit conversions happening in your queries so you can go back and fix them — and then rejoice in your improved query performance!

 

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Bordeaux, The Grand Theatre” by Stefano Montagner is licensed under CC BY-NC-ND 2.0

Have you ever needed to look at what data in a table used to look like?

If you have, it probably took a knuckle-cracking filled session of writing group-by statements, nested sub-queries, and window functions to write your time-travelling query.

Sorry for your lost day of productivity — I’ve been there too.

Fortunately for us, SQL Server 2016 introduces a new feature to make our point-in-time analysis queries easy to write: temporal tables.

Temporal Tables? Are Those The Same As Temporary Tables?

Don’t let the similar sounding name fool you: “temporal” <> “temporary”.

Temporal tables consist of two parts:

  1. The temporal table — this is the table that contains the current values of your data.
  2. The historical table — this table holds all of the previous values that at some point existed in your temporal table.

You might have created a similar setup yourself in previous versions of SQL using triggers. However, using a temporal table is different from this because:

  1. You don’t need to write any triggers/stored procedures! All of the history tracking is done automatically by SQL Server.
  2. Retrieving the data uses a simple WHERE clause — no complex querying required.

I want to make my life easier by using temporal tables! Take my money and show me how!

I’m flattered by your offer, but since we are good friends I’ll let you in on these secrets for free.

First let’s create a temporal table. I’m thinking about starting up a car rental business, so let’s model it after that:

The key things to note with our new table above are that

  1. it contains a PRIMARY KEY.
  2. it contains two datetime2 fields, marked with GENERATED ALWAYS AS ROW START/END.
  3. It contains the PERIOD FOR SYSTEM_TIME statement.
  4. It contains the SYSTEM_VERSIONING = ON property with the (optional) historical table name (dbo.CarIntventoryHistory).

If we query our newly created tables, you’ll notice our column layouts are identical:

Let’s fill it with the choice car of car rental agencies all across the U.S. — the Chevy Malibu:

Although we got some unassuming car models, at least we can express our individuality with two different paint colors!

In all of the remaining screen shots, the top result is our temporal table dbo.CarInventory and the bottom result is our historical table dbo.CarInventoryHistory.

You’ll notice that since we’ve only inserted one row for each our cars, there’s no row history yet and therefore our historical table is empty.

Let’s change that by getting some customers and renting out our cars!

Now we see our temporal table at work: we updated the rows in dbo.CarInventory and our historical table was automatically updated with our original values as well as timestamps for how long those rows existed in our table.

After a while, our customers return their rental cars:

It’s totally possible for someone to have driven 73 or 488 miles in a Chevy Malibu in under 4 minutes…ever hear the phrase “drive it like a rental”?

Our temporal table show the current state of our rental cars: the customers have returned the cars back to our lot and each car has accumulated some mileage.

Our historical table meanwhile got a copy of the rows from our temporal table right before our last UPDATE statement. It’s automatically keeping track of all of this history for us!

Continuing on, business is going well at the car rental agency. We get another customer to rent our silver Malibu:

Unfortunately, our second customer gets into a crash and destroys our car:

The customer walked away from the crash unscathed; the same can not be said for our profits.

With the deletion of our silver Malibu, our test data is complete.

Now that we have all of this great historically tracked data, how can we query it?

If we want to reminisce about better times when both cars were damage free and we were making money, we can write a query using SYSTEM_TIME AS OFto show us what our table looked like at that point in the past:

The good old days.

And if we want to do some more detailed analysis, like what rows have been deleted, we can query both temporal and historical tables normally as well:

C̶o̶l̶l̶i̶s̶i̶o̶n̶ Conclusion

Even with my car rental business not working out, at least we were able to see how SQL Server’s temporal tables helped us keep track of our car inventory data.

I hope you got as excited as I did the first time I saw temporal tables in action, especially when it comes to querying with FOR SYSTEM_TIME AS OF. Long gone are the days of needing complicated queries to rebuild data for a certain point in time.

 

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!