How To Make Your Queries Perform Like They Used To

Published Tue 19 September 2017 in SQL > Performance Tuning

chad-kirchoff-202730

In the ideal world, you fully test how your SQL Server will handle upgrading to the latest version. You're able to catch and mitigate any performance surprises early on.

In the real world, you might not be able to perform a good test before upgrading. You only learn about poor performance once you upgrade in production.

Does this scenario sound familiar? It's what I've been thinking about recently as I have to deal with query performance issues after upgrading to SQL Server 2016.

Coming from SQL Server 2012, I know the new cardnality estimator added in 2014 is a major source of the problems I'm experiencing. While the new cardinality estimator improves performance of some queries, it has also made made some of my queries take hours to run instead of seconds.

Long-term, the solution is to revisit these queries, their stats, and the execution plans being generated for them to see what can be rewritten for better performance.

But ain't nobody got time for that (at least when facing performance crippling production issues).

Short-term, put a band-aid on to stop the bleeding.

I could change the compatibility mode of the database to revert back to SQL Server 2012 (before the new cardinality estimator was introduced), but then I miss out on being able to use all of the new SQL Server 2016 improvements just because a few queries are misbehaving.

I could enable trace flag 9481 to have my queries use the old cardinality estimator, however as a developer I probably don't have access to play with trace flags (and for good reason).

Starting with 2016 SP1, what I can do is use the legacy cardinality estimator query hint:

OPTION (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION'));

This hint is great because it doesn't require developers to have any special permissions. It also allows SQL to use the old cardinality estimator for poor performing queries only - the rest of the server gets to benefit from the improvements brought on by the new cardinality estimator.

With time, I can revisit the queries that are using this hint to try to improve their performance, but in the mean time it's a great way to fix regressed query performance due to the new cardinality estimator.

How to Search and Destroy Non-SARGable Queries on Your Server

Published Tue 22 August 2017 in SQL > Development

Unexpected SQL Server Performance Killers #3

Photo by Cibi Chakravarthi on Unsplash

Watch this week's video on YouTube

In this series I explore scenarios that hurt SQL Server performance and show you how to avoid them. Pulled from my collection of "things I didn't know I was doing wrong for years."

Looking for a script to find non-SARGable queries on your server? Scroll to the bottom of this post.

What is a "SARGable" query?

Just because you add an index to your table doesn't mean you get immediate performance improvement. A query running against that table needs to be written in such a way that it actually takes advantage of that index.

SARGable, or "Search Argument-able", queries therefore are queries that are capable of utilizing indexes.

Examples please!

Okay let's see some examples of SARGable and non-SARGable queries using my favorite beverage data.

There are non-clustered indexes on the Name and CreateDate columns

First, let's look at a non-SARGable query:

SELECT Name
FROM dbo.CoffeeInventory
WHERE CONVERT(CHAR(10),CreateDate,121)  = '2017-08-19'

Although this query correctly filters our rows to a specific date, it does so with this lousy execution plan:

903f0-1ci7exqc5ao00i4bfm24ruw

SQL Server has to perform an Index Scan, or in other words has to check every single page of this index, to find our '2017–08–19' date value.

SQL Server does this because it can't immediately look at the value in the index and see if it is equal to the '2017–08–19' date we supplied — we told it first to convert every value in our column/index to a CHAR(10) date string so that it can be compared as a string.

Since the SQL Server has to first convert every single date in our column/index to a CHAR(10) string, that means it ends up having to read every single page of our index to do so.

The better option here would be to leave the column/index value as a datetime2 datatype and instead convert the right hand of the operand to a datetime2:

SELECT Name
FROM dbo.CoffeeInventory
WHERE CreateDate = CAST('2017-08-19' AS datetime2)

Alternatively, SQL Server is smart enough to do this conversion implicitly for us if we just leave our '2017–08–19' date as a string:

SELECT Name
FROM dbo.CoffeeInventory
WHERE CreateDate = '2017-08-19'

c63f6-1ady-tfnhv8j5dnygkim34g

In this scenario SQL gives us an Index Seek because it doesn't have to modify any values in the column/index in order to be able to compare it to the datetime2 value that '2017–08–19' got converted to.

This means SQL only has to read what it needs to output to the results. Much more efficient.

One more example

Based on the last example we can assume that any function, explicit or implicit, that is running on the column side of an operator will result in a query that cannot make use of index seeks, making it non-SARGable.

That means that instead of doing something non-SARGable like this:

SELECT Name, CreateDate
FROM dbo.CoffeeInventory
WHERE DAY(CreateDate)  = 19

We want to make it SARGable by doing this instead:

SELECT Name, CreateDate
FROM dbo.CoffeeInventory
WHERE 
    CreateDate  >= '2017-08-19 00:00:00' 
    AND CreateDate < '2017-08-20 00:00:00'

In short, keep in mind whether SQL Server will have to modify the data in a column/index in order to compare it — if it does, your query probably isn't SARGable and you are going to end up scanning instead of seeking.

OK, non-SARGable queries are bad…how do I check if I have any on my server?

The script below looks at cached query plans and searches them for any table or index scans. Next, it looks for scalar operators, and if it finds any it means we have ourselves a non-SARGable query. The fix is then to rewrite the query to be SARGable or add a missing index.

-- From https://github.com/bertwagner/SQLServer/blob/master/Non-SARGable%20Execution%20Plans.sql
-- This script will check the execution plan cache for any queries that are non-SARGable.
-- It does this by finding table and index scans that contain a scalar operators

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

DECLARE @dbname SYSNAME
SET @dbname = QUOTENAME(DB_NAME());

WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')

SELECT
   stmt.value('(@StatementText)[1]', 'varchar(max)') AS [Query],
   query_plan AS [QueryPlan],
   sc.value('(.//Identifier/ColumnReference/@Schema)[1]', 'varchar(128)') AS [Schema], 
   sc.value('(.//Identifier/ColumnReference/@Table)[1]', 'varchar(128)') AS [Table], 
   sc.value('(.//Identifier/ColumnReference/@Column)[1]', 'varchar(128)') AS [Column] ,
   CASE WHEN s.exist('.//TableScan') = 1 THEN 'TableScan' ELSE 'IndexScan' END AS [ScanType],
   sc.value('(@ScalarString)[1]', 'varchar(128)') AS [ScalarString]
FROM 
    sys.dm_exec_cached_plans AS cp
    CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
    CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt)
    CROSS APPLY stmt.nodes('.//RelOp[TableScan or IndexScan]') AS scan(s)
    CROSS APPLY s.nodes('.//ScalarOperator') AS scalar(sc)
WHERE
    s.exist('.//ScalarOperator[@ScalarString]!=""') = 1 
    AND sc.exist('.//Identifier/ColumnReference[@Database=sql:variable("@dbname")][@Schema!="[sys]"]') = 1
    AND sc.value('(@ScalarString)[1]', 'varchar(128)') IS NOT NULL

I've found this script useful for myself, but if you find any issues with it please let me know, thanks!

Why Parameter Sniffing Isn't Always A Bad Thing (But Usually Is)

Published Tue 08 August 2017 in SQL > Development

In this series I explore scenarios that hurt SQL Server performance and show you how to avoid them. Pulled from my collection of "things I didn't know I was doing wrong for years."

Watch this week's video on YouTube

Last week we discussed how implicit conversions could be one reason why your meticulously designed indexes aren't getting used.

Today let's look at another reason: parameter sniffing.

Here's the key: Parameter sniffing isn't always a bad thing.

Most of the time it's good: it means SQL Server is caching and reusing query plans to make your queries run faster.

Parameter sniffing only becomes a problem when the cached plan isn't anywhere close to being the optimal plan for given input parameters.

So what's parameter sniffing?

Let's start with our table dbo.CoffeeInventory which you can grab from Github.

4dedb-17uyfuaa5zzf_h5civ9ww4w

The key things to know about this table are that:

We have a nonclustered index on our Name column.
The data is not distributed evenly (we'll see this in a minute)

Now, let's write a stored procedure that will return a filtered list of coffees in our table, based on the country. Since there is no specific Country column, we'll write it so it filters on the Name column:

DROP PROCEDURE IF EXISTS dbo.FilterCoffee
GO
CREATE PROCEDURE dbo.FilterCoffee
@ParmCountry varchar(30)
AS
BEGIN
    SELECT Name, Price, Description 
    FROM Sandbox.dbo.CoffeeInventory
    WHERE Name LIKE @ParmCountry + '%'
END
GO

Let's take a look at parameter sniffing in action, then we'll take a look at why it happens and how to solve it.

EXEC dbo.FilterCoffee @ParmCountry = 'Costa Rica'
EXEC dbo.FilterCoffee @ParmCountry = 'Ethiopia'

Running the above statement gives us identical execution plans using table scans:

In this case we explicitly specified the parameter @ParmCountry. Sometimes SQL will parameterize simple queries on its own.

That's weird. We have two query executions, they are using the same plan, and neither plan is using our nonclustered index on Name!

Let's step back and try again. First, clear the query plan cache for this stored procedure:

DECLARE @cache_plan_handle varbinary(44)
SELECT @cache_plan_handle = c.plan_handle
FROM 
    sys.dm_exec_cached_plans c
    CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) t
WHERE 
    text like 'CREATE%CoffeeInventory%' 
-- Never run DBCC FREEPROCCACHE without a parameter in production unless you want to lose all of your cached plans...
DBCC FREEPROCCACHE(@cache_plan_handle)

Next, execute the same stored procedure with the same parameter values, but this time with the 'Ethiopia' parameter value first. Look at the execution plan:

EXEC dbo.FilterCoffee @ParmCountry = 'Ethiopia'
EXEC dbo.FilterCoffee @ParmCountry = 'Costa Rica'

9a86e-1en_loooe8aojxr0jonohzg

Now our nonclustered index on Name is being utilized. Both queries are still receiving the same (albeit different) plan.

We didn't change anything with our stored procedure code, only the order that we executed the query with different parameters.

What the heck is going on here!?

This is an example of parameter sniffing. The first time a stored procedure (or query) is ran on SQL server, SQL will generate an execution plan for it and store that plan in the query plan cache:

SELECT
    c.usecounts,
    c.cacheobjtype,
    c.objtype,
    c.plan_handle,
    c.size_in_bytes,
    d.name,
    t.text,
    p.query_plan
FROM 
    sys.dm_exec_cached_plans c
    CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) t
    CROSS APPLY sys.dm_exec_query_plan(c.plan_handle) p
    INNER JOIN sys.databases d
    ON t.dbid = d.database_id
WHERE 
    text like 'CREATE%CoffeeInventory%'

2c94d-1w8c6qvsh_ca3jumixdbnjw

All subsequent executions of that same query will go to the query cache to reuse that same initial query plan — this saves SQL Server time from having to regenerate a new query plan.

Note: A query with different values passed as parameters still counts as the "same query" in the eyes of SQL Server.

In the case of the examples above, the first time the query was executed was with the parameter for "Costa Rica". Remember when I said this dataset was heavily skewed? Let's look at some counts:

SELECT 
    LEFT(Name,CHARINDEX(' ',Name)) AS Country, 
    COUNT(*) AS CountryCount 
FROM dbo.CoffeeInventory 
GROUP BY 
    LEFT(Name,CHARINDEX(' ',Name))

109ae-1rdzlvzj99ccxpni3mwjnaw

"Costa Rica" has more than 10,000 rows in this table, while all other country names are in the single digits.

This means that when we executed our stored procedure for the first time, SQL Server generated an execution plan that used a table scan because it thought this would be the most efficient way to retrieve 10,003 of the 10,052 rows.

This table scan query plan is only optimal for Costa Rica . Passing in any other country name into the stored procedure would return only a handful of records, making it more efficient for SQL Server to use our nonclustered index.

However, since the Costa Rica plan was the first one to run, and therefore is the one that got added to the query plan cache, all other executions ended up using the same table scan execution plan.

After clearing our cached execution plan using DBCC FREEPROCCACHE, we executed our stored procedure again but with 'Ethiopia' as our parameter. SQL Server determined that a plan with an index seek is optimal to retrieve only 6 of the 10,052 rows in the table. It then cached that Index Seek plan, which is why the second time around the 'Costa Rica' parameter received the execution plan with Index Seek.

Ok, so how do I prevent parameter sniffing?

This question should really be rephrased as "how do I prevent SQL Server from using a sub-optimal plan from the query plan cache?"

Let's take a look at some of the techniques.

1. Use WITH RECOMPILE or OPTION (RECOMPILE)

We can simply add these query hints to either our EXEC statement:

EXEC dbo.FilterCoffee @ParmCountry = 'Ethiopia' WITH RECOMPILE
EXEC dbo.FilterCoffee @ParmCountry = 'Costa Rica' WITH RECOMPILE

or to our stored procedure itself:

DROP PROCEDURE IF EXISTS dbo.FilterCoffee
GO
CREATE PROCEDURE dbo.FilterCoffee
@ParmCountry varchar(30)
AS
BEGIN
    SELECT Name, Price, Description 
    FROM Sandbox.dbo.CoffeeInventory 
    WHERE Name LIKE @ParmCountry + '%'

    OPTION (RECOMPILE)

END
GO

What the RECOMPILE hint does is force SQL Server to generate a new execution plan every time these queries run.

Using RECOMPILE eliminates our parameter sniffing problem because SQL Server will regenerate the query plan every single time we execute the query.

The disadvantage here is that we lose all benefit from having SQL Server save CPU cycles by caching execution plans.

If your parameter sniffed query is getting ran frequently, RECOMPILE is probably a bad idea because you will encounter a lot of overheard to generate the query plan regularly.

If your parameter sniffed query doesn't get ran often, or if the query doesn't run often enough to stay in the query plan cache anyway, then RECOMPILE is a good solution.

2. Use the OPTIMIZE FOR query hint

Another option we have is to add either one of the following hints to our query. One of these would get added to the same location as OPTION (RECOMPILE) did in the above stored procedure:

OPTION (OPTIMIZE FOR (@ParmCountry UNKNOWN))

OPTION (OPTIMIZE FOR (@ParmCountry = 'Ethiopia'))

OPTIMIZE FOR UNKNOWN will use a query plan that's generated from the average distribution stats for that column/index. Often times it results in an average or bad execution plan so I don't like using it.

OPTIMIZE FOR VALUE creates a plan using whatever parameter value specified. This is great if you know your queries will be retrieving data that's optimized for the value you specified most of the time.

In our examples above, if we know the value 'Costa Rica' is rarely queried, we might optimize for index seeks. Most queries will then run the optimal cached query plan and we'll only take a hit when 'Costa Rica' is queried.

3. IF/ELSE

This solution allows for ultimate flexibility. Basically, you create different stored procedures that are optimized for different values. Those stored procedures have their plans cached, and then an IF/ELSE statement determines which procedure to run for a passed in parameter:

DROP PROCEDURE IF EXISTS dbo.FilterCoffee
GO
CREATE PROCEDURE dbo.FilterCoffee
@ParmCountry varchar(30)
AS
BEGIN
    IF @ParmCountry = 'Costa Rica'
    BEGIN
    EXEC dbo.ScanningStoredProcedure @ParmCountry
    END
    ELSE
    BEGIN
    EXEC dbo.SeekingStoredProcedure @ParmCountry
    END
END
GO

This option is more work (How do you determine what the IF condition should be? What happens more data is added to the table over time and the distribution of data changes?) but will give you the best performance if you want your plans to be cached and be optimal for the data getting passed in.

Conclusion

Parameter sniffing is only bad when your data values are unevenly distributed and cached query plans are not optimal for all values.
SQL Server caches the query plan that is generated from the first run of a query/stored procedure with whatever parameter values were used during that first run.
Using the RECOMPILE hint is a good solution when your queries aren't getting ran often or aren't staying in the the query cache most of the time anyway.
The OPTIMIZE FOR hint is good to use when you can specify a value that will generate a query plan that is efficient for most parameter values and are OK with taking a hit for a sub-optimal plan on infrequently queried values.
Using complex logic (like IF/ELSE) will give you ultimate flexibility and performance, but will also be the worst for long term maintenance.

Are your indexes being thwarted by mismatched datatypes?

Published Tue 01 August 2017 in SQL > Performance Tuning > Indexing

In this series I explore scenarios that hurt SQL Server performance and show you how to avoid them. Pulled from my collection of "things I didn't know I was doing wrong for years."

Watch this week's video on YouTube

Have you ever encountered a query that runs slowly, even though you've created indexes for it?

There's a few different reasons why this may happen. The one I see most frequently happens in the following scenario.

I'll have an espresso please

Let's say I have a table dbo.CoffeeInventory of coffee beans and prices that I pull from my favorite green coffee bean supplier each week. It looks something like this:

-- Make sure Actual Execution Plan is on
-- Let's see what our data looks like
SELECT * FROM dbo.CoffeeInventory

If you want to follow along, you can get this data set from this GitHub Gist

I want to be able to efficiently query this table and filter on price, so next I create an index like so:

CREATE CLUSTERED INDEX CL_Price ON dbo.CoffeeInventory (Price)

Now, I can write my query to find out what coffee prices are below my willingness to pay:

SELECT Name, Price FROM dbo.CoffeeInventory WHERE Price < 6.75

You would expect this query to be blazing fast and use a clustered index seek, right?

WRONG!

What the heck?

Why is SQL scanning the table when I added a clustered index on the column that I am filtering in my predicate? That's not how it's supposed to work!

Well dear reader, if we look a little bit closer at the table scan operation, we'll notice a little something called CONVERT_IMPLICIT:

CONVERT_IMPLICIT: ruiner of fast queries

What is CONVERT_IMPLICIT doing? Well as it implies, it's having to convert some data as it executes the query (as opposed to me having specified an explicit CAST() or CONVERT() function in my query).

The reason it needs to do this is because I defined my Price column as a VARCHAR(5):

Who put numeric data into a string datatype? Someone who hasn't had their coffee yet today.

In my query however, I'm doing a comparison against a number WHERE Price < 6.75. SQL Server is saying it doesn't know how to compare a string to a number, so it has to convert the VARCHAR string to a NUMERIC(3,2).

This is painful.

Why? Because SQL is performing that implicit conversion to the numeric datatype for every single row in my table. Hence, it can't seek using the index because it ends up having to scan the whole table to convert every record to a number first.

And this doesn't only happen with numbers and string conversion. Microsoft has posted an entire chart detailing what types of data type comparisons will force an implicit conversion:

<https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-type-conversion-database-engine>

That's a lot of orange circles/implicit conversions!

How can I query my coffee faster?

Well in this scenario, we have two options.

Fix the datatype of our table to align with the data actually being stored in this (data stewards love this).
Not cause SQL Server to convert every row in the column.

Number 1 above is self-explanatory, and the better option if you can do it. However, if you aren't able to modify the column type, you are better off writing your query like this:

SELECT Name, Price FROM dbo.CoffeeInventory WHERE Price < '6.75'

d03ed-1uzge0e3lizkhtodyonsfmg

Since we do a comparison of equivalent datatypes, SQL Server doesn't need to do any conversions and our index gets used. Woo-hoo!

What about the rest of my server?

Remember that chart above? There are a lot of different data comparisons that can force a painful column side implicit conversion by SQL Server.

Fortunately, Jonathan Kehayias has written a great query that helps you find column side implicit conversions by querying the plan cache. Running his query is a great way to identify most of the implicit conversions happening in your queries so you can go back and fix them — and then rejoice in your improved query performance!

One SQL Cheat Code For Amazingly Fast JSON Queries

Published Tue 09 May 2017 in SQL > Development > JSON

f836d-17tk3zornsj5nirg7ai5dbq

Watch this week's video on YouTube

Recently I've been working with JSON in SQL Server 2016 a lot.

One of the hesitations many people have with using JSON in SQL Server is that they think that querying it must be really slow — SQL is supposed to excel at relational data, not string parsing right?

It turns out that performance is pretty good with the standalone SQL Server JSON functions. Even better is that it's possible to make queries against JSON data run at ludicrous speeds by using indexes on JSON parsed computed columns. In this post I want to take a look at how SQL is able to parse* with such great performance.

*"Parse" here is actually a lie —it's doing something else behind the scenes. You'll see what I mean, keep reading!

Computed Columns in SQL Server

The only way to get JSON indexes working on SQL server is to use a computed column. A computed column is basically a column that performs a function to calculate its values.

For example, let's say we have a table with some car JSON data in it:

DROP TABLE IF EXISTS dbo.DealerInventory;
CREATE TABLE dbo.DealerInventory
(
  Id int IDENTITY(1,1) PRIMARY KEY,
  Year int,
  JsonData nvarchar(300)
);

INSERT INTO dbo.DealerInventory (Year, JsonData) VALUES (2017, '{ "Make" : "Volkswagen", "Model" : "Golf" }');

INSERT INTO dbo.DealerInventory (Year, JsonData) VALUES (2017, '{ "Make" : "Honda", "Model" : "Civic" }');

INSERT INTO dbo.DealerInventory (Year, JsonData) VALUES (2017, '{ "Make" : "Subaru", "Model" : "Impreza" }');

SELECT * FROM dbo.DealerInventory;

/* Output:
Id    Year     JsonData
----- -------- ---------------------------------------------
1     2017     { "Make" : "Volkswagen", "Model" : "Golf" }
2     2017     { "Make" : "Honda", "Model" : "Civic" }
3     2017     { "Make" : "Subaru", "Model" : "Impreza" }
*/

We can add a new computed column to the table, "Make", which parses and extracts the Make property from each row's JSON string:

ALTER TABLE dbo.DealerInventory
ADD Make AS JSON_VALUE(JsonData, '$.Make');

SELECT * FROM dbo.DealerInventory;

/* Output:
Id Year  JsonData                                    Make
-- ----- ------------------------------------------- ----------
1  2017  { "Make" : "Volkswagen", "Model" : "Golf" } Volkswagen
2  2017  { "Make" : "Honda", "Model" : "Civic" }     Honda
3  2017  { "Make" : "Subaru", "Model" : "Impreza" }  Subaru
*/

By default, the above Make computed column is non-persisted, meaning its values are never stored to the database (persisted computed columns can also be created, but that's a topic for a different time). Instead, every time a query runs against our dbo.DealerInventory table, SQL Server will calculate the value for each row.

The performance of this isn't great — it's essentially a scalar function running for each row of our output :(. However, when you combine a computed column with an index, something interesting happens.

Time to dive in with DBCC Page

DBCC Page is an undocumented SQL Server function that shows what the raw data stored in a SQL page file looks like. Page files are how SQL Server stores its data.

In the rest of this post we'll be looking at how data pages (where the actual table data in SQL is stored) and index pages (where our index data is stored) are affected by non-persisted computed columns — and how they make JSON querying super fast.

First, let's take a look at the existing data we have. We do this by first turning on trace flag 3604 and using DBCC IND to get the page ids of our data. Additional details on the column definitions in DBCC IND and DBCC PAGE can be found in Paul Randal's blog post on the topic.

DBCC TRACEON(3604);

-- "Sandbox" is the name of my database
DBCC IND('Sandbox','dbo.DealerInventory',-1);

55ac9-1qczux6z_us9fzacehamhpa

If you look at the results above, row 2 contains our data page (indicated by PageType = 1) and the PagePID of that page is 305088 (if you are playing along at home, your PagePID is most likely something else). If we then look up that PagePID using DBCC PAGE we get something like this:

DBCC PAGE('Sandbox',1,305088,3) WITH TABLERESULTS

65fdc-1jaa-flg1di52y3t2gohyxg

You can see our three rows of data highlighted in red. The important thing to note here is that our computed column of the parsed "Make" value is truly non-persisted and no where to be found, meaning it has to get generated for every row during query execution.

Now, what if we add an index to our non-persisted computed column and then run DBCC IND again:

CREATE NONCLUSTERED INDEX IX_ParsedMake ON dbo.DealerInventory (Make)

DBCC IND('Sandbox','dbo.DealerInventory',-1);

cab34-1vfyzcblexgrytsqu5miomg

You'll now notice that in addition to data page 305088 (PageType = 1), we also have an index page 305096 (PageType = 2). If we examine both the data page and the index page we see something interesting:

DBCC PAGE('Sandbox',1,305088,3) WITH TABLERESULTS

DBCC PAGE('Sandbox',1,305096,3) WITH TABLERESULTS

Nothing has changed with our data page:

3348f-1ktty-u5w4iha9emrwys9zw

But our index page contains the parsed values for our "Make" column:

1b9ea-16ysi1x2ilwdo7j3tkwowlw

What does this mean? I thought non-persisted computed columns aren't saved to disk!

Exactly right: our non-persisted computed column "Make" isn't saved to the data page on the disk. However if we create an index on our non-persisted computed column, the computed value is persisted on the index page!

This is basically a cheat code for indexing computed columns.

SQL will only compute the "Make" value on a row's insert or update into the table (or during the initial index creation) — all future retrievals of our computed column will come from the pre-computed index page.

This is how SQL is able to parse indexed JSON properties so fast; instead of needing to do a table scan and parsing the JSON data for each row of our table, SQL Server can go look up the pre-parsed values in the index and return the correct data incredibly fast.

Personally, I think this makes JSON that much easier (and practical) to use in SQL Server 2016. Even though we are storing large JSON strings in our database, we can still index individual properties and return results incredibly fast.