JSON Support Is The Best New Developer Feature in SQL 2016 — Part 4: Performance Comparisons

Published on: 2017-02-21

This is the fourth article in my series about learning how to use SQL Server 2016’s new JSON functions. If you haven’t already, you can read Part 1 — Parsing JSON, Part 2 — Creating JSON, and Part 3 — Updating, Adding, and Deleting JSON.


Additional performance comparisons available in an updated post.

We’ve finally come to my favorite part of analyzing any new software feature: performance testing. SQL Server 2016’s new JSON functions are great for parsing JSON data, creating JSON data, and modifying JSON data, but are they efficient?

Today we’ll examine three areas of SQL Server JSON performance:

  1. How to maximize performance for the new SQL Server JSON functions
  2. How the new SQL Server JSON functions compare against what was previously available in SQL Server
  3. How the new SQL JSON functions compare against Newtonsoft’s Json.NET

Maximizing SQL Server JSON Function Performance

I wanted to use a sizable data set in order to test the performance of the new JSON functions in SQL Server 2016. I found arthurkao’s car year/make/model data on GitHub and decided this ~20k element JSON array would be perfect for performance testing purposes. For my tests I’ll be using both the original JSON string as well as a SQL table that I created from the original JSON array:

-- Car data source: https://github.com/arthurkao/vehicle-make-model-data
IF OBJECT_ID('dbo.Cars') IS NOT NULL 
BEGIN
	DROP TABLE dbo.Cars;
END
CREATE TABLE dbo.Cars
(
	Id INT IDENTITY(1,1),
	CarDetails NVARCHAR(MAX)
);
-- See https://gist.github.com/bertwagner/1df2531676112c24cd1ab298fc750eb2 for the full untruncated version of this code
DECLARE @cars nvarchar(max) = '[ {"year":2001,"make":"ACURA","model":"CL"}, {"year":2001,"make":"ACURA","model":"EL"},...]';

INSERT INTO dbo.Cars (CarDetails)
SELECT value FROM OPENJSON(@cars, '$');

SELECT * FROM dbo.Cars;
/* 
Output:
Id          CarDetails
----------- ----------------------------------------------
1           {"year":2001,"make":"ACURA","model":"CL"}
2           {"year":2001,"make":"ACURA","model":"EL"}
3           {"year":2001,"make":"ACURA","model":"INTEGRA"}
...
*/

Unlike XML in SQL Server (which is stored in it’s own datatype), JSON in SQL Server 2016 is stored as an NVARCHAR. This means instead of needing to use special indexes, we can use indexes that we are already familiar with.

To maximize performance, we can use Microsoft’s recommendation of adding a computed column for one of the JSON properties and then indexing that computed column:

-- Remember to turn on "Include Actual Execution Plan" for all of these examples

-- Before we add any computed columns/indexes, let's see our execution plan for our SQL statement with a JSON predicate
SELECT * FROM dbo.Cars WHERE JSON_VALUE(CarDetails, '$.model') = 'Golf'
/*
Output:
Id          CarDetails
----------- --------------------------------------------------
1113        {"year":2001,"make":"VOLKSWAGEN","model":"GOLF"}
2410        {"year":2002,"make":"VOLKSWAGEN","model":"GOLF"}
3707        {"year":2003,"make":"VOLKSWAGEN","model":"GOLF"}
...
*/
-- The execution plan shows a Table Scan, not very efficient

-- We can now add a non-persisted computed column for our "model" JSON property.
ALTER TABLE dbo.Cars
ADD CarModel AS JSON_VALUE(CarDetails, '$.model');

-- We add the distinct to avoid parameter sniffing issues.  
-- Our execution plan now shows the extra computation that is occuring for every row of the table scan.
SELECT DISTINCT * FROM dbo.Cars WHERE JSON_VALUE(CarDetails, '$.model') = 'Golf'
SELECT DISTINCT * FROM dbo.Cars WHERE CarModel = 'Golf'

Non-persisted computed columns (like in the example above) do not take up any additional space in the table. You can verify this for yourself by running sp_spaceused 'dbo.Cars' before and after adding the non-persisted column to the table.

Having a computed column doesn’t add any performance to our query on its own but it does now allow us to add an index to our parsed/computed JSON property.

Having the computed column doesn’t improve performance — we are still seeing a Table Scan

The clustered index that we add next stores pointers to each parsed/computed value causing the table not to take up any space and only causes the SQL engine to recompute the columns when the index needs to be rebuilt:

-- Add an index onto our computed column
CREATE CLUSTERED INDEX CL_CarModel ON dbo.Cars (CarModel)

-- Check the execution plans again
SELECT DISTINCT * FROM dbo.Cars WHERE JSON_VALUE(CarDetails, '$.model') = 'Golf'
SELECT DISTINCT * FROM dbo.Cars WHERE CarModel = 'Golf'
-- We now get index seeks!

And the resulting execution plan now shows both queries (the one using JSON_VALUE() in the WHERE clause directly as well the one calling our computed column) using index seeks to find the data we are looking for:

Yay index seeks!

Overall, adding computed columns to our table adds no overhead in terms of storage space and allows us to then add indexes on JSON properties which improve performance significantly.

SQL Server 2016 JSON vs SQL Server pre-2016 JSON

As I’ve mentioned before, the best option for processing JSON data in SQL Server before 2016 was by using Phil Factor’s amazing JSON parsing function. Although the function works well, it is limited by what SQL Server functionality was available at the time and therefore wasn’t all that efficient.

-- Let's compare how quick Phil Factor's JSON parsing function does against the new SQL 2016 functions
-- Phil's parseJSON function can be downloaded from https://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-in-sql-server/

SELECT years.StringValue AS Year, makes.StringValue AS Make, models.StringValue AS Model FROM dbo.parseJSON(@cars) models
INNER JOIN dbo.parseJSON(@cars) years ON models.parent_ID = years.parent_ID
INNER JOIN dbo.parseJSON(@cars) makes ON models.parent_ID = makes.parent_ID
WHERE models.NAME = 'model' AND models.StringValue = 'Golf' AND years.NAME = 'year' AND makes.NAME = 'make'

The above query should work for getting the data we need. I’m abusing what the parseJSON function was probably built to do (I don’t think it was intended to parse ~20k element JSON arrays), and I’ll be honest I waited 10 minutes before killing the query. Basically, trying to parse this much data in SQL before 2016 just wasn’t possible (unless you wrote CLR).

Compared to the following queries which is using our indexed computed column SQL Server 2016 is able to return all of the results to us in 1 ms:

-- Indexed computed column returns results in ~1ms
SELECT * FROM dbo.Cars WHERE CarModel = 'Golf'

SQL Server 2016 JSON vs Newtonsoft’s Json.NET

In cases like the above where parsing JSON in SQL Server was never an option, my preferred method has always been to parse data in C#. In particular, Newtonsoft’s Json.NET is the standard for high performance JSON parsing, so let’s take a look at how SQL Server 2016 compares to that.

The following code shows 6 tests I ran in SQL Server 2016:

-- Turn on stats and see how long it takes to parse the ~20k JSON array elements
SET STATISTICS TIME ON

-- Test #1
-- Test how long it takes to parse each property from all ~20k elements from the JSON array
-- SQL returns this query in ~546ms
SELECT JSON_VALUE(value, '$.year') AS [Year], JSON_VALUE(value, '$.make') AS Make, JSON_VALUE(value, '$.model') AS Model FROM OPENJSON(@cars, '$') 

-- Test #2
-- Time to deserialize and query just Golfs without computed column + index
-- This takes ~255ms in SQL Server
SELECT * FROM OPENJSON(@cars, '$') WHERE JSON_VALUE(value, '$.model') = 'Golf'

-- Test #3
-- Time it takes to compute the same query for Golf's with a computed column and clustered index 
-- This takes ~1ms on SQL Server
SELECT * FROM dbo.Cars WHERE CarModel = 'Golf'

-- Test #4
-- Serializing data on SQL Server takes ~110ms
SELECT * FROM dbo.Cars FOR JSON AUTO

-- What about serializing/deserializing smaller JSON datasets?
-- Let's create our smaller set
DECLARE @carsSmall nvarchar(max) = '[ {"year":2001,"make":"ACURA","model":"CL"}, {"year":2001,"make":"ACURA","model":"EL"}, {"year":2001,"make":"ACURA","model":"INTEGRA"}, {"year":2001,"make":"ACURA","model":"MDX"}, {"year":2001,"make":"ACURA","model":"NSX"}, {"year":2001,"make":"ACURA","model":"RL"}, {"year":2001,"make":"ACURA","model":"TL"}]';

-- Test #5
-- Running our query results in the data becoming deserialized in ~0ms
SELECT JSON_VALUE(value, '$.year') AS [Year], JSON_VALUE(value, '$.make') AS Make, JSON_VALUE(value, '$.model') AS Model FROM OPENJSON(@carsSmall, '$') 
--30ms in sql

-- Test #6
-- And serialized in ~0ms
SELECT TOP 7  * FROM dbo.Cars FOR JSON AUTO

And then the same tests in a C# console app using Json.Net:

static void Main(string[] args)
        {
            string cars = @"[ {""year"":2001,""make"":""ACURA"",""model"":""CL""}, ... ]";
            Stopwatch stopwatch = new Stopwatch();

            // Test #1
            stopwatch.Start();
            var deserializedCars = JsonConvert.DeserializeObject<IEnumerable<Car>>(cars);
            stopwatch.Stop();
            long elapsedMillisecondsDeserialize = stopwatch.ElapsedMilliseconds;

            // Test #2 & #3
            stopwatch.Restart();
            var queriedCars = JsonConvert.DeserializeObject<IEnumerable<Car>>(cars).Where(x=>x.model == "Golf");
            stopwatch.Stop();
            long elapsedMillisecondsQuery = stopwatch.ElapsedMilliseconds;

            // Test #4
            stopwatch.Restart();
            var serializedCars = JsonConvert.SerializeObject(deserializedCars);
            stopwatch.Stop();
            long elapsedMillisecondsSerialize = stopwatch.ElapsedMilliseconds;

            // smaller data
            string carsSmall = @"[ {""year"":2001,""make"":""ACURA"",""model"":""CL""}, {""year"":2001,""make"":""ACURA"",""model"":""EL""}, {""year"":2001,""make"":""ACURA"",""model"":""INTEGRA""}, {""year"":2001,""make"":""ACURA"",""model"":""MDX""}, {""year"":2001,""make"":""ACURA"",""model"":""NSX""}, {""year"":2001,""make"":""ACURA"",""model"":""RL""}, {""year"":2001,""make"":""ACURA"",""model"":""TL""}]";

            // Test #5
            stopwatch.Restart();
            var deserializedCarsSmall = JsonConvert.DeserializeObject<IEnumerable<Car>>(carsSmall);
            stopwatch.Stop();
            long elapsedMillisecondsDeserializeSmall = stopwatch.ElapsedMilliseconds;

            // Test #6
            stopwatch.Restart();
            var serializedCarsSmall = JsonConvert.SerializeObject(deserializedCarsSmall);
            stopwatch.Stop();
            long elapsedMillisecondsSerializeSmall = stopwatch.ElapsedMilliseconds;
        }

And the results compared side by side:

Essentially, it seems like Json.Net beats SQL Server 2016 on larger JSON manipulations, both are equal with small JSON objects, and SQL Server 2016 has the advantage at filtering JSON data when indexes are used.

Conclusion

SQL Server 2016 is excellent at working with JSON. Even though Json.NET beats SQL Server 2016 at working with large JSON objects (on the magnitude of milliseconds), SQL Server is equally fast on smaller objects and is advantageous when JSON data needs to be filtered or searched.

I look forward to using the SQL Server 2016 JSON functions more in the future, especially in instances where network I/O benefits me to process JSON on the SQL Server or when working with applications that cannot process JSON data, like SQL Server Reporting Services.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

How SQL 2016’s JSON Functionality Will Make You Want To Use JSON

Published on: 2017-02-14

This post is a response to this month’s T-SQL Tuesday prompt. T-SQL Tuesday was created by Adam Machanic and is a way for SQL users to share ideas about interesting topics. This month’s topic is Fixing Old Problems with Shiny New Toys.


Why I love JSON

I know that lots of users in the SQL community cringe when they hear about JSON, NoSQL, and other non-relational database storage options. I get it. Lots of times a string storage format compromises the many benefits SQL has to offer: indexing, data integrity, simple querying, oversight into schema changes by developers, etc… I’ve seen (and written) some queries that perform like molasses in winter solely due them being in a non-sargable string format.

Yet, with all of those downsides I still love JSON. I think there’s a happy middle ground where DBAs and developers can agree on when it is appropriate to use JSON. Would I ever want to use JSON in a data warehouse that is being heavily queried for analytics? No. Would I also want to use JSON where performance is extremely important? No. Would I want to use it in an application where data or schema integrity is extremely important? No.

However, I love using JSON when capturing dynamically structured user input, especially if the database is being used only to persist the data instead of analyze it. I love having JSON in the database because it so easily serializes/deserializes between JSON and my C# models. If I want low latency to my webapp, I also might write the JSON from the app to the database and then have it get queued to transform into normalized data by an ETL later on. These reason in particular are what make me incredibly excited for the new JSON functionality in SQL Server 2016.

Okay, I guess there are some valid uses for JSON. What are my options for using JSON in SQL Server?

Before SQL Server 2016 was released there was no support for JSON in SQL Server. None. Zilch. Zero. The best thing we had was Phil Factor’s amazing JSON parsing user defined function or writing your own CLR.

Phil’s function is truly amazing and it cleverly uses some built-in SQL functionality to deserialize almost any type of JSON string you can throw at it. It does however have downsides — the function has a few edge case bugs (try adding a space before any “:” in your JSON), it does not allow for easy querying of the JSON, and the function needs to be added to any database you want to use it on.

I’m not trying to downplay Phil’s function — I love it and use it in a lot of my code — but the fact of the matter is that it is a very clever workaround for something that Microsoft wasn’t supporting at the time.

So what did JSON parsing look like in the pre-2016 world? Let’s take a look at some example data I generated using http://www.json-generator.com/:

-- This version of that data has new lines removed to save space.
-- Please see https://gist.github.com/bertwagner/965acde93706a9a5d772509e56247a1c for the neatly formatted version of the data.

DECLARE @WebsiteJson nvarchar(max) = '{ "Users": [ { "_id": "589f14e8427b0030d59615b1", "index": 0, "guid": "4a5ecd66-8c41-4553-9c1b-7597124b46e3", "isActive": true, "balance": "$1,418.71", "picture": "http://placehold.it/32x32", "age": 26, "eyeColor": "brown", "name": "Hanson Larson", "gender": "male", "company": "KENEGY", "email": "hansonlarson@kenegy.com", "phone": "+1 (804) 447-3852", "address": "698 Durland Place, Hachita, Louisiana, 1920", "about": "Consectetur laboris sunt proident ullamco ex excepteur duis cillum sit dolor occaecat officia. Sunt amet cupidatat enim mollit esse non minim dolore ullamco minim duis do. Sunt sint aliqua sit excepteur anim proident consequat magna reprehenderit laborum. Voluptate officia et duis sit laborum.\r\n", "registered": "2016-11-09T10:59:38 +05:00", "latitude": -72.754583, "longitude": 15.975401, "tags": [ "culpa", "excepteur", "adipisicing", "reprehenderit", "eiusmod", "officia", "incididunt" ], "friends": [ { "id": 0, "name": "Blake Dodson" }, { "id": 1, "name": "Marta Bullock" }, { "id": 2, "name": "Benson Soto" } ], "greeting": "Hello, Hanson Larson! You have 7 unread messages.", "favoriteFruit": "apple" }, { "_id": "589f14e8547a192d7a11a174", "index": 1, "guid": "a3c6533b-4c50-4054-9517-a03e8aa9c9ec", "isActive": true, "balance": "$3,450.53", "picture": "http://placehold.it/32x32", "age": 28, "eyeColor": "green", "name": "Wood Douglas", "gender": "male", "company": "FARMAGE", "email": "wooddouglas@farmage.com", "phone": "+1 (819) 515-3319", "address": "113 Dean Street, Kylertown, Massachusetts, 1666", "about": "Sunt laborum duis pariatur qui esse velit dolore mollit magna aute. Eiusmod velit dolor nostrud dolor culpa labore duis eiusmod sunt Lorem. Adipisicing eu minim voluptate veniam quis eu laboris dolore ipsum. Voluptate deserunt velit qui aliqua cillum. Voluptate in sit mollit irure velit fugiat dolore minim commodo. Commodo ea aliquip cupidatat commodo consequat ipsum laboris. Excepteur velit voluptate laborum dolor officia ullamco.\r\n", "registered": "2015-09-22T11:15:47 +04:00", "latitude": -88.972896, "longitude": 35.988156, "tags": [ "tempor", "in", "adipisicing", "nostrud", "officia", "cupidatat", "occaecat" ], "friends": [ { "id": 0, "name": "Vivian Gates" }, { "id": 1, "name": "Quinn Fitzgerald" }, { "id": 2, "name": "Jenifer Lowery" } ], "greeting": "Hello, Wood Douglas! You have 2 unread messages.", "favoriteFruit": "banana" }, { "_id": "589f14e8e9c6c96cd1f0b548", "index": 2, "guid": "4c71a47a-8212-46aa-97e3-bf8c136c49f5", "isActive": true, "balance": "$2,118.35", "picture": "http://placehold.it/32x32", "age": 27, "eyeColor": "green", "name": "Copeland Wiley", "gender": "male", "company": "OPTYK", "email": "copelandwiley@optyk.com", "phone": "+1 (801) 416-3699", "address": "824 Bristol Street, Sparkill, Delaware, 1706", "about": "In veniam elit nisi reprehenderit adipisicing veniam magna. Veniam proident commodo deserunt minim eiusmod commodo aliquip voluptate in occaecat id cillum fugiat. Excepteur magna adipisicing officia qui sit dolor nisi pariatur anim excepteur.\r\n", "registered": "2016-09-27T03:54:01 +04:00", "latitude": -52.752953, "longitude": -119.42916, "tags": [ "excepteur", "labore", "irure", "voluptate", "enim", "commodo", "sint" ], "friends": [ { "id": 0, "name": "Harding Sanford" }, { "id": 1, "name": "Tessa Colon" }, { "id": 2, "name": "Washington Jordan" } ], "greeting": "Hello, Copeland Wiley! You have 7 unread messages.", "favoriteFruit": "strawberry" }, { "_id": "589f14e857f47268893121d0", "index": 3, "guid": "1ba95a0c-7559-4fdd-b52a-94f5d2c76ca0", "isActive": true, "balance": "$1,955.54", "picture": "http://placehold.it/32x32", "age": 20, "eyeColor": "green", "name": "Kathleen Bradley", "gender": "female", "company": "PASTURIA", "email": "kathleenbradley@pasturia.com", "phone": "+1 (916) 463-3527", "address": "512 Holmes Lane, Chaparrito, District Of Columbia, 1651", "about": "Ea nostrud minim exercitation commodo laborum. Minim irure est sint adipisicing. Voluptate nisi fugiat qui et quis sint nisi ea deserunt laboris est dolor cillum. Ad sit duis nisi culpa aliquip incididunt.\r\n", "registered": "2015-07-20T03:11:35 +04:00", "latitude": -9.512905, "longitude": -108.671577, "tags": [ "eiusmod", "enim", "tempor", "amet", "reprehenderit", "sit", "ut" ], "friends": [ { "id": 0, "name": "Rowe Crawford" }, { "id": 1, "name": "Ginger Bray" }, { "id": 2, "name": "Nichole Bradford" } ], "greeting": "Hello, Kathleen Bradley! You have 6 unread messages.", "favoriteFruit": "banana" }, { "_id": "589f14e8128844a31ccb82e2", "index": 4, "guid": "03e2341e-da4d-48bb-8d6f-006234835828", "isActive": false, "balance": "$1,827.38", "picture": "http://placehold.it/32x32", "age": 32, "eyeColor": "brown", "name": "Colon Mays", "gender": "male", "company": "FREAKIN", "email": "colonmays@freakin.com", "phone": "+1 (919) 555-3410", "address": "627 Reed Street, Taft, Texas, 1724", "about": "Ullamco magna tempor dolore enim veniam aute incididunt. Deserunt nostrud fugiat reprehenderit consequat deserunt. Labore consequat magna pariatur amet mollit aute reprehenderit ea dolore exercitation anim.\r\n", "registered": "2014-06-08T02:13:34 +04:00", "latitude": 35.186997, "longitude": -1.722535, "tags": [ "cillum", "amet", "ad", "enim", "tempor", "amet", "est" ], "friends": [ { "id": 0, "name": "Gladys Rojas" }, { "id": 1, "name": "Luann Craft" }, { "id": 2, "name": "Camille Gibson" } ], "greeting": "Hello, Colon Mays! You have 7 unread messages.", "favoriteFruit": "apple" } ] }';

Basically we have a JSON string that represents five users and some of their attributes (note: I would argue that storing this much data about an individual would be better suited to fit into some structured tables, but alas this was the type of data that was easy to generate for example purposes).

If I wanted to parse out each user’s name to use for an infrequently ran report or for ETL purposes, this is what my query would have to look like:

SELECT 
	NAME, 
	StringValue, 
	ValueType
FROM 
(
	SELECT 
		[NAME], 
		StringValue, 
		ValueType, 
		LAG([NAME],1,0) OVER (ORDER BY  element_id, sequenceNo, parent_ID) as PreviousKey 
	FROM 
		Sandbox.dbo.parseJSON(@WebsiteJson) -- This is Phil Factor's JSON parsing function: https://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-in-sql-server/
)d 
WHERE 
	PreviousKey <> 'id' 
	and [NAME] = 'name'

There are a couple funny things we have to do in pre-2016 SQL to parse this JSON:

  1. We have to add the parseJson() function to our database
  2. Since there are multiple key’s named “NAME”, we need to add a LAG() function to help identify the property’s key so we can ensure we have the correct one.
  3. Nested queries required since windowed functions can’t run in the WHERE clause.

Overall this code works and it gets the job done. However we need to ensure our server has the parseJson( function added and filtering our JSON data to pull out the properties we are interested in takes a little bit of work. Additionally, the parseJson() function does do some heavy processing which affects our overall performance:

The Shiny New Toys in SQL Server 2016: JSON Functions

SQL Server 2016 includes many new JSON parsing functions that are available for us to use. Writing a query to return the same result set as above is now as simple as this:

SELECT 
	JSON_VALUE(value, '$.name') 
FROM 
	OPENJSON(@WebsiteJson,'$.Users')

The OPENJSON() function allows us to write an XPath query to filter the JSON Users object into five separate rows of data, one per user:

Then, we use the JSON_VALUE() function and XPath once again to filter out just the “name” property. Overall, this code is much simpler to write, performs more consistently (the issue with a space character before a “:” is handled correctly), and performs much quicker:

I love the syntax of the new SQL JSON parsing functions: it’s easy to remember, easy to use in real life scenarios, and is very fast. This is one new feature of SQL Server 2016 that definitely makes my life easier and makes any other ways of parsing out JSON data obsolete.


If you are interested in learning more about JSON in SQL server, I’ve been blogging the past few weeks about how to use all of SQL’s new JSON functions.

There’s also one bug that I found with how SQL handles JSON deletes that I submitted to Microsoft Connect if you want to upvote and see JSON continue to evolve on SQL Server.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

JSON Support Is The Best New Developer Feature in SQL 2016 — Part 3: Updating, Adding, and Deleting…

Published on: 2017-02-07

This is the third article in my series about learning how to use SQL Server 2016’s new JSON functions. If you haven’t already, you can read Part 1 — Parsing JSON and Part 2 — Creating JSON.


So far we’ve looked at how to parse existing JSON objects and how to create new JSON objects. Today I want to look at the easy ways to modify JSON objects as well as the (mostly) easy ways to delete elements from a JSON object.

SQL Server 2016 offers us the new JSON_MODIFY() function for updating existing JSON strings. It’s pretty simple to use for replacing existing values in a JSON string:

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

-- I upgraded some features in my Volkswagen recently, technically making it equivalent to a "GLI" instead of a "GL".  
-- Let's update our JSON using JSON_MODIFY:
SET @garage = JSON_MODIFY(@garage, '$.Cars[0].Model.Trim', 'GLI')
SELECT @garage
-- Output: { "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }

The JSON_MODIFY() function works similar to a SQL REPLACE() function: the first argument specifies what data we are modifying, the second argument selects which property we are going to replace via XPath syntax, and the third argument specifies what we are replacing the value with. Pretty easy!

Replacing values with JSON is simple. Adding new values into existing JSON is also fairly simple using JSON_MODIFY():

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

-- I decided to sell my Golf.  Let's add a new "SellDate" property to the JSON saying when I sold my Volkswagen.
-- If we use strict mode, you'll see we can't add SellDate because the key never existed before
--SELECT JSON_MODIFY(@garage, 'append strict $.Cars[0].SellDate', '2017-02-17T00:00:00.000Z')
-- Output: Property cannot be found on the specified JSON path.

-- However, in lax mode (default), we have no problem adding the SellDate
SELECT JSON_MODIFY(@garage, 'append lax $.Cars[0].SellDate', '2017-02-17T00:00:00.000Z')
-- Output: { "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" ,"SellDate":["2017-02-17T00:00:00.000Z"]}, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }

-- After selling my Golf, I bought another car a few days later: A new Volkswagen Golf GTI.  Let's add it to our garge:
-- Note the use of JSON_QUERY; this is so our string is interpreted as a JSON object instead of a plain old string
SET @garage = JSON_MODIFY(@garage, 'append $.Cars', JSON_QUERY('{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }'))
SELECT @garage;
-- Output: { "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }

Replacing data and adding new data is pretty easy with JSON_MODIFY(). This new SQL Server 2016 function ain’t no two trick pony though — it allows deletion of data as well!

Deleting properties in a JSON object is fairly straightforward: all you have to do is run the function with the same arguments as our modification example, except this time passing in NULL as our replacement value:

DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z", "SellDate" : "2017-02-17T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }'

-- Let's remove the PurchaseDate property on my original Volkswagen Golf since it's not relevant anymore:
SET @garage = JSON_MODIFY(@garage, '$.Cars[0].PurchaseDate', NULL)
SELECT @garage
-- Output: { "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "SellDate" : "2017-02-17T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }

Up to this point, JSON_MODIFY() has worked great for modifying, adding to, and deleting properties from our JSON data. However, there is one serious flaw with JSON_MODIFY() and that’s deleting JSON array values — instead of deleting the value from the array and then shifting the rest of the array over, it simply replaces the array value with a NULL:

DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "SellDate" : "2017-02-17T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }'

-- I realize it's not worth keeping the original Volkswagen in my @garage data any longer, so let's completely remove it.
-- Note, if we use NULL as per the MSDN documentation, we don't actually remove the first car element of the array - it just gets replaced with NULL
-- This is problematic if we expect the indexes of our array to shift by -1.
SELECT JSON_MODIFY(@garage, '$.Cars[0]', NULL)
-- Output: { "Cars": [null, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }

-- To truly delete it (and not have the NULL appear as the first item in the array) we have to convert to a rowset, select everything that's not the first row, aggregate the rows into a string (UGH) and then recreate as JSON.
-- This is incredibly ugly.  The STREAM_AGG() function in SQL vNext should make it a little cleaner, but why doesn't the JSON_MODIFY NULL syntax just get rid of the element in the array?
-- I have opened a Microsoft connect issue for this here: https://connect.microsoft.com/SQLServer/feedback/details/3120404 
SELECT JSON_QUERY('{ "Cars" : [' + 
		STUFF((
			   SELECT	',' + value
               FROM OPENJSON(@garage, '$.Cars') 
			   WHERE [key] <> 0
               FOR XML PATH('')), 1, 1, '') + '] }')
-- Output: { "Cars" : [{ "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }

I hate that solution. FOR XML PATH() to rebuild or JSON array? Ugly, ugly, ugly. I have been impressed with all of the new JSON functionality in SQL Server 2016 except for the deletion of array elements.

Deleting properties should be the same as deleting array elements with JSON_MODIFY(): the property and array element should be completely removed from the JSON object, not just replaced with a NULL. I opened a Microsoft Connect issue for this bug here, please vote for it if you want to see this problem fixed as well: https://connect.microsoft.com/SQLServer/feedback/details/3120404

Microsoft’s been pretty good about fixing bugs lately, so let’s hope this gets fixed in SQL Server vNext!

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!