Two Words For Faster Scalar Functions

Published Tue 04 December 2018 in SQL > Performance Tuning

WITH SCHEMABINDING optimizations were added all the way back in SQL Server 2005. So why bother talking about them in 2018?

Because no one is taking advantage of them!

Ok, maybe that's a little unfair. I'm sure some people advantage of the optimizations, but most code I see posted online that could benefit doesn't include it. So let's talk about an easy way for some of our non-data-accessing scalar functions to get a performance boost.

WITH SCHEMABINDING

When you create a function or view, you can add the WITH SCHEMABINDING option to prevent any database objects that the view/function uses from being modified. This is a pretty cool feature which prevents you from making a table or column change that would cause a view/function to break.

And while that's pretty cool functionality on its own, what's even better is that the SQL Server optimizer can do some pretty cool things when it knows one of your non-data accessing scalar functions is schema bound.

For example, let's say we have these two functions. You'll notice the second one includes the WITH SCHEMABINDING syntax:

DROP FUNCTION IF EXISTS dbo.UDF_RemoveSpaces_NotSchemaBound;
GO
CREATE FUNCTION dbo.UDF_RemoveSpaces_NotSchemaBound(@input VARCHAR(100))
    RETURNS VARCHAR(100)
BEGIN
    RETURN REPLACE(@input,' ','')
END;
GO

DROP FUNCTION IF EXISTS dbo.UDF_RemoveSpaces_SchemaBound;
GO
CREATE FUNCTION dbo.UDF_RemoveSpaces_SchemaBound(@input VARCHAR(100)) 
    RETURNS VARCHAR(100) WITH SCHEMABINDING
BEGIN
    RETURN REPLACE(@input,' ','')
END;
GO

When SQL Server executes a function, by default it checks whether the function has any database objects it needs to access. For our example functions above, this is a waste of time since neither function accesses any data.

The WITH SCHEMABINDING option forces SQL Server to take note at function creation time whether any data access is occurring. This allows SQL Server to skip that check at run time and allowing the function to execute significantly faster:

SET STATISTICS IO, TIME ON;

SELECT dbo.UDF_RemoveSpaces_NotSchemaBound('Oh yeah') AS CleanValue
INTO #Temp1
FROM
    (SELECT * FROM master..spt_values WHERE number < 500) t1 
    CROSS JOIN (SELECT * FROM master..spt_values WHERE number < 500) t2;
/*
 SQL Server Execution Times:
   CPU time = 1594 ms,  elapsed time = 1977 ms.
*/

SELECT dbo.UDF_RemoveSpaces_SchemaBound('Oh yeah') AS CleanValue
INTO #Temp2
FROM
    (SELECT * FROM master..spt_values WHERE number < 500) t1 
    CROSS JOIN (SELECT * FROM master..spt_values WHERE number < 500) t2;
/*
SQL Server Execution Times:
   CPU time = 62 ms,  elapsed time = 59 ms.
*/

Take a look at those CPU times: 1594ms vs 62ms! Since SQL Server saves a LOT of time by not having to verify the underlying data sources (for a more in-depth explanation of how SQL Server checks whether a function accesses data, I highly recommend reading this StackOverflow answer by Paul White).

WITH SCHEMABDINING also has performance optimization for queries that would normally need to implement Halloween Protection as well. Halloween Protection essentially prevents SQL Server from modifying the same records more than once and usually implemented by the addition of an spool operator to the execution plan:

UPDATE #Temp1 SET CleanValue = dbo.UDF_RemoveSpaces_NotSchemaBound('Oh yeah');
UPDATE #Temp2 SET CleanValue = dbo.UDF_RemoveSpaces_SchemaBound('Oh yeah');

In this example, you'll notice our non-schema-bound function introduces a Table Spool while our second schema-bound function forgoes this addition since SQL Server knows there won't be any potential for conflict.

In conclusion, if you have a non-data-accessing scalar function you should always add WITH SCHEMABINDING to reap the benefits of SQL Server's optimizations when available.

Bert & Pinal Troubleshoot a Slow Performing SQL Server

Published Tue 13 November 2018 in SQL > Performance Tuning

Watch this week's video on YouTube

What happens when you run into performance tuning expert Pinal Dave? Talk about how to troubleshoot a slow performing SQL Server of course!

This week is all about the video, so if you can't view it above, head on over to my YouTube channel to watch it there.

After watching, be sure to check out Pinal's free performance diagnostic scripts over at SQLAuthority.com:

Extracting JSON Values Longer Than 4000 Characters

Published Tue 18 September 2018 in SQL > Development > JSON

Watch this week's video on YouTube

A while back I built an automated process that parses JSON strings into a relational format.

Up until recently this process had been working great: my output table had all of the data I was expecting, neatly parsed into the correct rows and columns.

Last week I noticed an error in the output table however. One row that was supposed to have a nicely parsed JSON value for a particular column had an ugly NULL instead.

Truncated?

First I checked my source JSON string - it had the "FiveThousandAs" property I was looking for:

DECLARE @json nvarchar(max) = N'{
    "Id" : 1,
    "FiveThousandAs" : "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
    "FourAs" : "aaaa"
}';

So the source data was fine.

I checked the table column I was inserting into as well and confirmed it was defined as nvarchar(max), so no problem there.

The last thing I checked was the query I was using:

SELECT JSON_VALUE(@json, '$.FiveThousandAs')

If I run that on it's own, I reproduce the NULL I was seeing inserted into my table:

2018-09-15_18-26-27

JSON_VALUE is limiting

After a little bit more research, I discovered that the return type for JSON_VALUE is limited to 4000 characters. Since JSON_VALUE is in lax mode by default, if the output has more than 4000 characters, it fails silently.

To force an error in future code I could use SELECT JSON_VALUE(@json, 'strict \$.FiveThousandAs') so at least I would be notified immediately of an problem with my query/data (via failure).

Although strict mode will notify me of issues sooner, it still doesn't help me extract all of the data from my JSON property.

(Side note: I couldn't define my nvarchar(max) column as NOT NULL because for some rows the value could be NULL, but in the future I might consider adding additional database validation with a check constraint).

OPENJSON

The solution to reading the entire 5000 character value from my JSON property is to use OPENJSON:

SELECT * 
FROM OPENJSON(@json) 
WITH (FiveThousandAs nvarchar(max) '$.FiveThousandAs')

My insert query needed to be slightly refactored, but now I'm able to return any length value (as long as it's under 2gb).

In hindsight, I should have used OPENJSON() from the start: not only is it capable of parsing the full length values from JSON strings, but it performs significantly faster than any of the other SQL Server JSON functions.

As a best practice, I think I'm going to use OPENJSON by default for any JSON queries to avoid problems like this in the future.

Does The Order Of Index Columns Matter?

Published Tue 04 September 2018 in SQL > Performance Tuning > Indexing

Watch this week's video on YouTube

When beginning to learn SQL, at some point you learn that indexes can be created to help improve the performance of queries.

Creating your first few indexes can be intimidating though, particularly when trying to understand what order to put your key columns in.

Today we'll look at how row store indexes work to understand whether index column order matters.

Heap: Stack of Pages

blue-jay-heap Imagine a stack of loose leaf pages. This collection of pages is our table.

Each page has information about a bird on it - the bird's name, picture, description, habitat, migration patterns, visual markings, etc... You can think of each of these pages as a row of data.

The problem with this stack of pages is that there is no enforced order: it's a heap. Without any enforced order, searching for individual birds is time consuming; in order to find a particular bird, for example a blue jay, you would have to go through the stack of pages one at a time until you find the blue jay page.

The scanning doesn't stop there though. Even though we found a blue jay page, there's no way for us to guarantee that there are no other blue jay pages in the stack. This means we have to continue flipping through every page until we finish searching through the whole heap of pages.

Having to do this process every single time we need to retrieve data from our bird table is painful. To make our job easier, we can define and enforce an order on the data by defining a clustered index.

Clustered Index: Bound Pages

clustered-index-1 To make searching through our pages easier, we sort all of the pages by bird name and glue on a binding. This book binding now keeps all of our pages in alphabetical order by bird name.

The SQL version of a book binding is a clustered index. The clustered index is not an additional object to our data - it is that same exact table data, but now with an enforced sort order.

Having all of our data in sorted order by bird name makes certain queries really fast and efficient - instead of having to scan through every page to find the blue jay entry, we can now quickly flip to the "B" section, then the "BL" section, then the "BLU" section, etc... until we find BLUE JAY. This is done quickly and efficiently because we know where to find blue jays in the book because the bird names are stored in alphabetical order.

Even better, after we find the blue jay page, we flip to the next page and see a page for cardinal. Since we know all of the entries are stored alphabetically, we know that once we get to the next bird we have found all of our blue jay pages and don't need to continue flipping through the rest of the book.

While the clustered index allows us to find birds by name quickly, it's not perfect; since the clustered index is the table, it contains every property (column) of each bird, which is a lot of data!

Having to constantly reference this large, clustered index for each of my queries can be too cumbersome. For most of our queries, we could get by with condensed version of my bird book that only contains the most essential information in it.

Nonclustered Index: Cut and Copy

nc-index Let's say we want a lighter-weight version of our book that contains the most relevant information (bird name, color, description).

We can photocopy the entire book and then cut out and keep only the pieces of information that are relevant while discarding the rest. If we paste all of those relevant pieces of information into a new book, still sorted by bird name, we now have a second copy of our data. This is our nonclustered index.

This nonclustered index contains all of the same birds as my clustered index, just with fewer columns. This means I can fit multiple birds onto a page, requiring me to flip through fewer pages to find the bird I need.

If we ever need to look up additional information about a particular bird that's not in our nonclustered index, we can always go back to my giant clustered index and retrieve any information we need.

With the lighter-weight nonclustered index in-hand, we go out to the woods to start identifying some birds.

Upon spotting an unfamiliar bird in our binoculars, we can flip open the nonclustered index to identify the bird.

The only problem is, since we don't know this bird's name, our nonclustered index by bird name is of no help. We end up having to flip through each page one at a time trying to identify the bird instead of flipping directly to the correct page.

For these types of inquires where we want to identify a bird don't know the bird's name, a different index would beneficial...

Nonclustered Index 2: Color Bugaloo

nc-index-2 Instead of having a nonclustered index sorted by bird name, what we really need is a way to filter down to the list of potential birds quickly.

One way we can do this is to create another copy of my book, still containing just bird names, colors, and descriptions, but this time order the book pages so they are in order of color first, then bird name.

When trying to identify an unknown bird, we can first limit the number of pages to search through by filtering on the bird's color. In our case, color is a highly selective trait, since it filters down our list of potential birds to only a small subset of the whole book. In our blue jay example, this means we would find the small subset of pages that contain blue birds, and then just check each one of those pages individually until we find the blue jay.

Order Matters

Indexes aren't magic; their high-performance capabilities come from the fact that they store data in a predetermined order. If your query can utilize data stored in that order, great!

However, if your query wants to filter down on color first, but your index is sorted on bird name, then you'll be out of luck. When it comes to determining what column should be the first key in your index, you should choose whichever one will be most selective (which one will filter you down to the fewest subset of results) for your particular query.

There's a lot more optimizing that can be done with indexes, but correctly choosing the order of columns for your index key is an essential first step.

Want to learn even more about index column order? Be sure to check out this post on cardinality.

Should You Use Index Hints?

Published Tue 31 July 2018 in SQL > Performance Tuning > Indexing

Watch this week's video on YouTube

One of the things that the SQL Server query optimizer does is determine how to retrieve the data requested by your query.

Usually it does a pretty good job, which is a great because if it didn't then we'd be spending most of our days programming sorting and joining algorithms instead of having fun actually working with our data.

Sometimes the query optimizer has a lapse in judgement and createds a less-than-efficient plan, requiring us to step in and save the day.

Index Hints Give You Control

One way to "fix" a poor performing plan is to use an index hint. While we normally have no control over how SQL Server retrieves the data we requested, an index hint forces the query optimizer to use the index specified in the hint to retrieve the data (hence, it's really more of a "command" than a "hint").

Sometimes when I feel like I'm losing control I like using an index hint to show SQL Server who's boss. I occasionally will also use index hints when debugging poor performing queries because it allows me to confirm whether using an alternate index would improve performance without having to overhaul my code or change any other settings.

...But Sometimes That's Too Much Power

While I like using index hints for short-term debugging scenarios, that's about the only time they should be used because they can create some pretty undesirable outcomes.

For example, let's say I have this nice simple query and index here:

CREATE INDEX IX_OwnerUserId_CreationDate_Includes
ON dbo.Posts (OwnerUserId, CreationDate) INCLUDE (AcceptedAnswerId, ClosedDate, CommentCount, FavoriteCount, LastActivityDate);

SELECT
    OwnerUserId,
    AcceptedAnswerId
FROM
    dbo.Posts
WHERE
    OwnerUserId < 1000

This index was specifically created for a different query running on the Posts table, but it will also get used by the simple query above.

Executing this query without any hints causes SQL Server to use it anyway (since it's a pretty good index for the query), and we get decent performance: only 1002 logical reads.

2018-07-30_12-40-12 I wish all of my execution plans were this simple.

Let's pretend we don't trust the SQL Server optimizer to always choose this index, so instead we force it to use it by adding a hint:

SELECT
    OwnerUserId,
    AcceptedAnswerId
FROM
    dbo.Posts WITH (INDEX(IX_OwnerUserId_CreationDate_Includes))
WHERE
    OwnerUserId < 1000

With this hint, the index will perform exactly the same: 1002 logical reads, a good index seek, etc...

But what happens if in the future a better index gets added to the table?

CREATE INDEX IX_OwnerUserId_AcceptedAnswerId_Includes
ON dbo.Posts (OwnerUserId, AcceptedAnswerId) INCLUDE (LastEditorUserId, ParentId);

If we run the query WITHOUT the index hint, we'll see that SQL Server actually chooses this new index because it's smaller and we can get the data we need in only 522 logical reads:

2018-07-30_12-45-02 This execution plan looks the same, but you'll notice the smaller, more data dense index is being used.

If we had let SQL Server do it's job, it would have given us a great performing query! Instead, we decided to intervene and hint (ie. force) it to use a sub-optimal index.

Things Can Get Worse

The above example is pretty benign - sure, without the hint SQL Server would have read about half as many pages, but this isn't a drastic difference in this scenario.

What could be disastrous is if because of the hint, the query optimizer decides to make a totally different plan that isn't nearly as efficient. Or if one day someone drops the hinted index, causing the query with the hint to down right fail:

2018-07-30_12-50-55

Index hints can be nice to use in the short-term for investigating, testing, and debugging. However, they are almost never the correct long-term solution for fixing query performance.

Instead, it's better to look for the root-cause of a poor performing query: maybe you need to rebuild stats on an index or determine if the cardinality estimator being used is not ideal. You might also benefit from rewriting a terribly written query.

Any of these options will likely help you create a better, long-term, flexible solutions rather than forcing SQL Server to use the same hard-coded, potentially sub-optimal index forever.