Is It Possible To Conditionally Index JSON Data?

Watch this week's video on YouTube

Recently I received a great question from an attendee to one of my sessions on JSON (what's up Nam!):

2018-04-25_15-58-21

At first glance it sounds like a filtered index question, and ultimately it is, but because of some of the intricacies involved in the response I thought it would make for a good blog post.

The Problem: Schema On Read

Imagine I have a central table that keeps track of warnings and errors for my burrito ordering app:

DROP TABLE IF EXISTS dbo.BurritoAppLog;
GO

CREATE TABLE dbo.BurritoAppLog 
( 
    Id int IDENTITY PRIMARY KEY,
    ErrorDetails nvarchar(1000)
); 
GO 

INSERT INTO dbo.BurritoAppLog VALUES (N'{"Type":"Warning", "MessageId": 100, "Severity": "High", "Information":"Running low on steak." }'); 
INSERT INTO dbo.BurritoAppLog VALUES (N'{"Type":"Warning", "MessageId": 50, "Severity": "Low", "Information":"Running low on queso." }');
GO 4000
INSERT INTO dbo.BurritoAppLog VALUES (N'{"Type":"Error", "MessageId": 10, "User":"Bert", "ErrorMessage":"Lettuce not available." }'); 
INSERT INTO dbo.BurritoAppLog VALUES (N'{"Type":"Error", "MessageId": 20, "User":"Jim", "ErrorMessage":"Cannot wrap burrito with quadruple meat." }'); 
GO 100

2018-04-25_19-21-04

Now imagine wanting to generate a report of only the rows that are errors.

Obviously, you'd want to index this data for faster querying performance.  Adding a non-clustered index on a non-persisted computed column of our JSON "Type" property will accomplish that:

ALTER TABLE dbo.BurritoAppLog 
ADD ErrorType AS JSON_VALUE(ErrorDetails, '$.Type');

ALTER TABLE dbo.BurritoAppLog 
ADD MessageId AS JSON_VALUE(ErrorDetails, '$.MessageId');

CREATE INDEX IX_ErrorType ON dbo.BurritoAppLog (ErrorType) INCLUDE (MessageId);

SELECT MessageId FROM dbo.BurritoAppLog WHERE ErrorType = 'Error'

And that works great.  Except that error entries in our table make up only 2.5% of our total rows.  Assuming we'll never need to query WHERE ErrorType = 'Warning' , this index is using a lot of unnecessary space.

So what if we create a filtered index instead?

Filtered JSON Indexes...

A filtered index should benefit us significantly here: it should save us space (since it won't include all of those warning rows) and it should make our INSERT queries into this table faster since the index won't need to be maintained for our non-"Error" rows.

So let's create a filtered index:

CREATE INDEX FX_ErrorType ON dbo.BurritoAppLog (ErrorType) INCLUDE (MessageId) WHERE ErrorType = 'Error'

Oh.

2018-04-25_19-47-03-1

So I guess we can't create a filtered index where the filter is on a computed column.  Maybe SQL Server won't mind if we persist the computed column?

DROP INDEX IX_ErrorType ON dbo.BurritoAppLog

ALTER TABLE dbo.BurritoAppLog
DROP COLUMN ErrorType;

ALTER TABLE dbo.BurritoAppLog 
ADD ErrorType AS JSON_VALUE(ErrorDetails, '$.Type') PERSISTED;

CREATE INDEX FX_ErrorType ON dbo.BurritoAppLog (ErrorType) INCLUDE (MessageId) WHERE ErrorType = 'Error'

NOOOOOOPPPPEEEE.  Same error message.

The issue is that SQL Server does not like computed columns, persisted or not, in a filtered index's WHERE clause.  It's one of the many limitations of filtered indexse (Aaron Bertrand has a great post outlining many of the shortcomings).

Computed Column Filtered Index Workaround

What is a performance minded, space-cautious, JSON-loving developer supposed to do?

One workaround to get our filtered index would be to parse our ErrorType property into its own table column on insert:

ALTER TABLE dbo.BurritoAppLog 
ADD PermanentErrorType varchar(10);

UPDATE dbo.BurritoAppLog SET PermanentErrorType = JSON_VALUE(ErrorDetails, '$.Type');

2018-04-25_20-01-45

With our PermanentErrorType column in place, we have no problem generating our filtered index:

CREATE INDEX FX_PermanentErrorType ON dbo.BurritoAppLog (PermanentErrorType) INCLUDE (MessageId) WHERE PermanentErrorType = 'Error'

If we compare the sizes of our nonclustered index to our filtered index, you'll immediately that the filtered index is significantly smaller:

2018-04-25_20-12-31-1

However, our table size is now slightly larger because of the added table column.

Conclusion

So what do you do if you run into this situation?  Well, if the ratio of undesired records to desired records is large like in the example above, you might want to make a permanent column to include in your filtered index - the size/performance benefit is certainly there.  This does mean that your table size will be larger (additional column) but performance will be faster if your queries are able to use the smaller filtered index.