How to Search and Destroy Non-SARGable Queries on Your Server

Published Tue 22 August 2017 in SQL > Development

Unexpected SQL Server Performance Killers #3

Photo by Cibi Chakravarthi on Unsplash

In this series I explore scenarios that hurt SQL Server performance and show you how to avoid them. Pulled from my collection of "things I didn't know I was doing wrong for years."

Looking for a script to find non-SARGable queries on your server? Scroll to the bottom of this post.

What is a "SARGable" query?

Just because you add an index to your table doesn't mean you get immediate performance improvement. A query running against that table needs to be written in such a way that it actually takes advantage of that index.

SARGable, or "Search Argument-able", queries therefore are queries that are capable of utilizing indexes.

Examples please!

Okay let's see some examples of SARGable and non-SARGable queries using my favorite beverage data.

There are non-clustered indexes on the Name and CreateDate columns

First, let's look at a non-SARGable query:

SELECT Name
FROM dbo.CoffeeInventory
WHERE CONVERT(CHAR(10),CreateDate,121)  = '2017-08-19'

Although this query correctly filters our rows to a specific date, it does so with this lousy execution plan:

903f0-1ci7exqc5ao00i4bfm24ruw

SQL Server has to perform an Index Scan, or in other words has to check every single page of this index, to find our '2017–08–19' date value.

SQL Server does this because it can't immediately look at the value in the index and see if it is equal to the '2017–08–19' date we supplied — we told it first to convert every value in our column/index to a CHAR(10) date string so that it can be compared as a string.

Since the SQL Server has to first convert every single date in our column/index to a CHAR(10) string, that means it ends up having to read every single page of our index to do so.

The better option here would be to leave the column/index value as a datetime2 datatype and instead convert the right hand of the operand to a datetime2:

SELECT Name
FROM dbo.CoffeeInventory
WHERE CreateDate = CAST('2017-08-19' AS datetime2)

Alternatively, SQL Server is smart enough to do this conversion implicitly for us if we just leave our '2017–08–19' date as a string:

SELECT Name
FROM dbo.CoffeeInventory
WHERE CreateDate = '2017-08-19'

c63f6-1ady-tfnhv8j5dnygkim34g

In this scenario SQL gives us an Index Seek because it doesn't have to modify any values in the column/index in order to be able to compare it to the datetime2 value that '2017–08–19' got converted to.

This means SQL only has to read what it needs to output to the results. Much more efficient.

One more example

Based on the last example we can assume that any function, explicit or implicit, that is running on the column side of an operator will result in a query that cannot make use of index seeks, making it non-SARGable.

That means that instead of doing something non-SARGable like this:

SELECT Name, CreateDate
FROM dbo.CoffeeInventory
WHERE DAY(CreateDate)  = 19

We want to make it SARGable by doing this instead:

SELECT Name, CreateDate
FROM dbo.CoffeeInventory
WHERE 
    CreateDate  >= '2017-08-19 00:00:00' 
    AND CreateDate < '2017-08-20 00:00:00'

In short, keep in mind whether SQL Server will have to modify the data in a column/index in order to compare it — if it does, your query probably isn't SARGable and you are going to end up scanning instead of seeking.

OK, non-SARGable queries are bad…how do I check if I have any on my server?

The script below looks at cached query plans and searches them for any table or index scans. Next, it looks for scalar operators, and if it finds any it means we have ourselves a non-SARGable query. The fix is then to rewrite the query to be SARGable or add a missing index.

-- From https://github.com/bertwagner/SQLServer/blob/master/Non-SARGable%20Execution%20Plans.sql
-- This script will check the execution plan cache for any queries that are non-SARGable.
-- It does this by finding table and index scans that contain a scalar operators

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

DECLARE @dbname SYSNAME
SET @dbname = QUOTENAME(DB_NAME());

WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')

SELECT
   stmt.value('(@StatementText)[1]', 'varchar(max)') AS [Query],
   query_plan AS [QueryPlan],
   sc.value('(.//Identifier/ColumnReference/@Schema)[1]', 'varchar(128)') AS [Schema], 
   sc.value('(.//Identifier/ColumnReference/@Table)[1]', 'varchar(128)') AS [Table], 
   sc.value('(.//Identifier/ColumnReference/@Column)[1]', 'varchar(128)') AS [Column] ,
   CASE WHEN s.exist('.//TableScan') = 1 THEN 'TableScan' ELSE 'IndexScan' END AS [ScanType],
   sc.value('(@ScalarString)[1]', 'varchar(128)') AS [ScalarString]
FROM 
    sys.dm_exec_cached_plans AS cp
    CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
    CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt)
    CROSS APPLY stmt.nodes('.//RelOp[TableScan or IndexScan]') AS scan(s)
    CROSS APPLY s.nodes('.//ScalarOperator') AS scalar(sc)
WHERE
    s.exist('.//ScalarOperator[@ScalarString]!=""') = 1 
    AND sc.exist('.//Identifier/ColumnReference[@Database=sql:variable("@dbname")][@Schema!="[sys]"]') = 1
    AND sc.value('(@ScalarString)[1]', 'varchar(128)') IS NOT NULL

I've found this script useful for myself, but if you find any issues with it please let me know, thanks!

How to Automatically Purge Historical Data From a Temporal Table

Published Tue 04 July 2017 in SQL > Development > Temporal Tables

Watch this week's video on YouTube

Temporal Tables are awesome.

They make analyzing time-series data a cinch, and because they automatically track row-level history, rolling-back from an "oops" scenario doesn't mean you have to pull out the database backups.

The problem with temporal tables is that they produce a lot of data. Every row-level change stored in the temporal table's history table quickly adds up, increasing the possibility that a low-disk space warning is going to be sent to the DBA on-call.

In the future with SQL Server 2017 CTP3, Microsoft allows us to add a retention period to our temporal tables, making purging old data in a temporal table as easy as specifying:

ALTER DATABASE DatabaseName
SET TEMPORAL_HISTORY_RETENTION ON

CREATE TABLE dbo.TableName (
...
) 
WITH
( 
    SYSTEM_VERSIONING = ON      
    (
        HISTORY_TABLE = dbo.TableNameHistory,            
        HISTORY_RETENTION_PERIOD = 6 MONTHS      
    )  
);

However, until we are all on 2017 in production, we have to manually automate the process with a few scripts.

Purging old data out of history tables in SQL Server 2016

In the next few steps we are going to write a script that deletes data more than a month old from my CarInventoryHistory table:

SELECT * FROM dbo.CarInventory;
SELECT * FROM dbo.CarInventoryHistory;

04cd5-1kjmrbwqd96geq1ypp5ejaa

And now if we write our DELETE statement:

ALTER TABLE dbo.CarInventory SET ( SYSTEM_VERSIONING = OFF  ) ;
GO

-- In the real-world we would do some DATE math here
DECLARE @OneMonthBack DATETIME2 = '2017-06-04';

DELETE FROM dbo.CarInventoryHistory WHERE SysStartTime < @OneMonthBack;

You'll notice that we first had to turn system versioning off: SQL Server won't let us delete data from a history table that is currently tracking a temporal table.

This is a poor solution however. Although the data will delete correctly from our history table, we open ourselves up to data integrity issues. If another process INSERTs/UPDATEs/DELETEs into our temporal table while the history deletion is occurring, those new INSERTs/UPDATEs/DELETEs won't be tracked because system versioning is turned off.

The better solution is to wrap our ALTER TABLE/DELETE logic in a transaction so any other queries running against our temporal table will have to wait:

-- Run this in query window #1 (delete data):
BEGIN TRANSACTION;
ALTER TABLE dbo.CarInventory SET ( SYSTEM_VERSIONING = OFF );
GO

-- In the real-world we would do some DATE math here
DECLARE @OneMonthBack DATETIME2 = '2017-06-04';

DELETE FROM dbo.CarInventoryHistory WITH (TABLOCKX)
WHERE SysStartTime < @OneMonthBack;

-- Let's wait for 10 seconds to mimic a longer delete operation
WAITFOR DELAY '00:00:10';

--Re-enable our SYSTEM_VERSIONING
ALTER TABLE dbo.CarInventory SET ( SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.CarInventoryHistory));
GO

COMMIT TRANSACTION;

-- Run this in query window #2 during the same time as the above query (trying to update during deletion):
UPDATE dbo.CarInventory SET InLot = 0 WHERE CarId = 4;

And the result? Our history table data was deleted while still tracking the row-level data changes to our temporal table:

60a83-1sx-gxdjlwhjanonw622vnw

All that is left to do is to throw this script into a SQL Agent job and schedule how often you want it to run.

When Is It Appropriate To Store JSON in SQL Server?

Published Tue 14 March 2017 in SQL > Development > JSON

Who needs a relational database when everything can be stored in a JSON string?

Every once in a while I hear of some technologist say that relational databases are dead; instead, a non-table based NoSQL storage format is the way of the future. SQL Server 2016 introduced JSON functionality, making it possible for some "non-SQL" data storage to make its way into the traditionally tabled-based SQL Server.

Does this mean all data in SQL Server going forward should be stored in long JSON strings? No, that would be a terrible idea. There are instances when storing JSON in SQL Server is a good choice though. In this post I want to create recommendations for when data should be stored as JSON and when it shouldn't.

Databases Should Not Be Entirely Comprised Of JSON

The screenshot below is an example of what I think some developers would do if they were given free reign in SQL Server 2016:

bb208-1zzw7xinj5wmotojplta2lw

Here we have an application database ("InventoryApp") that consists of only a single table ("dbo.Data") with three JSON NVARCHAR(MAX) columns to represent all of the data required by the app. Relationships exist between Sales, Purchases, and Customers but these are not defined on the database side.

If you are from the world of relational-SQL, you might not believe that anyone would design such a database structure. Believe me though, this is a realistic scenario. Entire companies (eg. Firebase: https://firebase.google.com/) build their services around abstracting the database layer away from developers, essentially storing entire tables or databases in large JSON strings.

Many developers like storing data this way because it is easy to deserialize JSON strings into objects in their programming languages to use in their apps. They like the fact that with JSON they can have an infinitely changing storage schema (just add new keys, values, and arrays!) so if they need a new field for their app, they can just add it in, serialize the object to a JSON string, and store it again in the database.

Obviously, going completely "NoSQL" might make short term development easier/quicker, but using SQL Server 2016 to only store data this way is a travesty: there's no way to use many of SQL Server's amazing performance, schema definition and validation, and security features.

So when is it appropriate to store JSON in SQL Server?

Appropriate Use Case #1: Error Logging

Errors happen. When they do, it's nice to be able to go back and look at the error message to see what happened.

The problem is that the structure of error messages isn't always consistent. Sometimes only the value of a single property will help identify the cause of failure. Other times, something more complex fails and it would be nice to have all of the values of a complex object available to make troubleshooting easier.

This is where JSON steps in: in most programming languages, it is easy to convert error messages and run time values to a JSON object on error. And since error messages and data values change in structure depending on where they occur, it's easy to dynamically turn any type of object into JSON data.

This data is perfect to store in SQL to be looked at later. None of these ideas are new — nvarchar(max) has been in SQL for a while now, and so programmers everywhere have been storing error information in that datatype.

With SQL Server 2016, it is now easier to examine and parse the error information directly in SQL Server Management Studio with the variety of JSON parsing functions available. No longer do programmers have to copy the code into some different tool — they can easily do it in SSMS.

Appropriate Use Case #2: Piloting Ideas

Most large workplaces have controls in place that prevent developers from making changes in production. In general this is a Good Idea™.

However, controls are sometimes too restrictive. For example, due to security restrictions, lack of server space, company politics, etc… developers are sometimes stuck developing in production. It's an unfortunate fact of life. In those scenarios, developers have to go through hell if they have to elevate each database structure change every time they want to test something in production.

JSON to the rescue! An nvarchar(max) column in a table can have its JSON data be easily added to and modified to fit more data than it was originally intended to hold. All without any database structure change requests.

Now this is not an ideal situation. In fact, it's a scenario that can add a lot of technical debt to the application long-term if not planned for.

However, if a "flexible" JSON column is built with eventual conversion to a traditional table structure in mind from the start, it's actually simple for a developer to transition an entirely JSON storage structure to a relational format later on. They key here is that the developer needs to have this conversion planned from day one.

Appropriate Use Case #3: Non-Analytical Data

Analytical data is SQL Server's bread and butter. Need to store lots of data and be able to query against it all day long? No problem, there are a plethora of performance tuning options to make your queries run fast and efficiently.

However, sometimes not all data needs to be analyzed. Often an app might need to save some session data to a database temporarily — why bother creating all of the maintenance overhead of strict database schemas if the data will never be queried for analytical purposes? Another example might be a website's dynamically created user profile settings. You can build normalized table(s) to store all of that data, but then you will be writing programming logic to normalize and denormalize your data out of the app.

If this data will not have to be searched, then why bother adding all of the overhead? Keep it in JSON and be done with.