JSON Support Is The Best New Developer Feature in SQL 2016 — Part 3: Updating, Adding, and Deleting…

Published Tue 07 February 2017 in SQL > Development > JSON

db7a3-1m1vek-lxcgj2texkk_z0xg

This is the third article in my series about learning how to use SQL Server 2016's new JSON functions. If you haven't already, you can read Part 1 — Parsing JSON and Part 2 — Creating JSON.

So far we've looked at how to parse existing JSON objects and how to create new JSON objects. Today I want to look at the easy ways to modify JSON objects as well as the (mostly) easy ways to delete elements from a JSON object.

SQL Server 2016 offers us the new JSON_MODIFY() function for updating existing JSON strings. It's pretty simple to use for replacing existing values in a JSON string:

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

-- I upgraded some features in my Volkswagen recently, technically making it equivalent to a "GLI" instead of a "GL".  
-- Let's update our JSON using JSON_MODIFY:
SET @garage = JSON_MODIFY(@garage, '$.Cars[0].Model.Trim', 'GLI')
SELECT @garage
-- Output: { "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }

The JSON_MODIFY() function works similar to a SQL REPLACE() function: the first argument specifies what data we are modifying, the second argument selects which property we are going to replace via XPath syntax, and the third argument specifies what we are replacing the value with. Pretty easy!

Replacing values with JSON is simple. Adding new values into existing JSON is also fairly simple using JSON_MODIFY():

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

-- I decided to sell my Golf.  Let's add a new "SellDate" property to the JSON saying when I sold my Volkswagen.
-- If we use strict mode, you'll see we can't add SellDate because the key never existed before
--SELECT JSON_MODIFY(@garage, 'append strict $.Cars[0].SellDate', '2017-02-17T00:00:00.000Z')
-- Output: Property cannot be found on the specified JSON path.

-- However, in lax mode (default), we have no problem adding the SellDate
SELECT JSON_MODIFY(@garage, 'append lax $.Cars[0].SellDate', '2017-02-17T00:00:00.000Z')
-- Output: { "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" ,"SellDate":["2017-02-17T00:00:00.000Z"]}, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }

-- After selling my Golf, I bought another car a few days later: A new Volkswagen Golf GTI.  Let's add it to our garge:
-- Note the use of JSON_QUERY; this is so our string is interpreted as a JSON object instead of a plain old string
SET @garage = JSON_MODIFY(@garage, 'append $.Cars', JSON_QUERY('{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }'))
SELECT @garage;
-- Output: { "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }

Replacing data and adding new data is pretty easy with JSON_MODIFY(). This new SQL Server 2016 function ain't no two trick pony though — it allows deletion of data as well!

Deleting properties in a JSON object is fairly straightforward: all you have to do is run the function with the same arguments as our modification example, except this time passing in NULL as our replacement value:

DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z", "SellDate" : "2017-02-17T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }'

-- Let's remove the PurchaseDate property on my original Volkswagen Golf since it's not relevant anymore:
SET @garage = JSON_MODIFY(@garage, '$.Cars[0].PurchaseDate', NULL)
SELECT @garage
-- Output: { "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "SellDate" : "2017-02-17T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }

Up to this point, JSON_MODIFY() has worked great for modifying, adding to, and deleting properties from our JSON data. However, there is one serious flaw with JSON_MODIFY() and that's deleting JSON array values — instead of deleting the value from the array and then shifting the rest of the array over, it simply replaces the array value with a NULL:

DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "SellDate" : "2017-02-17T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }'

-- I realize it's not worth keeping the original Volkswagen in my @garage data any longer, so let's completely remove it.
-- Note, if we use NULL as per the MSDN documentation, we don't actually remove the first car element of the array - it just gets replaced with NULL
-- This is problematic if we expect the indexes of our array to shift by -1.
SELECT JSON_MODIFY(@garage, '$.Cars[0]', NULL)
-- Output: { "Cars": [null, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }

-- To truly delete it (and not have the NULL appear as the first item in the array) we have to convert to a rowset, select everything that's not the first row, aggregate the rows into a string (UGH) and then recreate as JSON.
-- This is incredibly ugly.  The STREAM_AGG() function in SQL vNext should make it a little cleaner, but why doesn't the JSON_MODIFY NULL syntax just get rid of the element in the array?
-- I have opened a Microsoft connect issue for this here: https://connect.microsoft.com/SQLServer/feedback/details/3120404 
SELECT JSON_QUERY('{ "Cars" : [' + 
        STUFF((
               SELECT   ',' + value
               FROM OPENJSON(@garage, '$.Cars') 
               WHERE [key] <> 0
               FOR XML PATH('')), 1, 1, '') + '] }')
-- Output: { "Cars" : [{ "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }

I hate that solution. FOR XML PATH() to rebuild or JSON array? Ugly, ugly, ugly. I have been impressed with all of the new JSON functionality in SQL Server 2016 except for the deletion of array elements.

Deleting properties should be the same as deleting array elements with JSON_MODIFY(): the property and array element should be completely removed from the JSON object, not just replaced with a NULL. I opened a Microsoft Connect issue for this bug here, please vote for it if you want to see this problem fixed as well: https://connect.microsoft.com/SQLServer/feedback/details/3120404

Microsoft's been pretty good about fixing bugs lately, so let's hope this gets fixed in SQL Server vNext!

JSON Support Is The Best New Developer Feature in SQL 2016 — Part 2: Creating JSON

Published Tue 31 January 2017 in SQL > Development > JSON

56938-1m1vek-lxcgj2texkk_z0xg

This is the second article in my series about learning how to use SQL Server 2016's new JSON functions. If you haven't already, you can read Part 1 — Parsing JSON.

Last time we looked at SQL 2016's new functions for parsing JSON data. Today, I want to explore the different options available for creating JSON data from a SQL result set.

The first option we have for creating JSON is by hardcoding a SQL string value. This is a terribly painful way to creating JSON and doesn't use any of SQL 2016's new functionality. However, for the sake of completeness I thought it would good to highlight the obvious:

-- The most primative way of creating JSON in SQL.  We don't want to have to do this
DECLARE @garage nvarchar(100) = '{ "Cars" : [{ "Make" : "Volkswagen"}, { "Make" : "Subaru"}] }'

-- But it works!
SELECT @garage
-- Output: { "Cars" : [{ "Make" : "Volkswagen"}, { "Make" : "Subaru"}] }

-- And with our SQL 2016 ISJSON() function we can check that the JSON string is valid
SELECT ISJSON(@garage)
-- Output: 1

Fortunately, SQL 2016 makes it much easier to generate JSON data from a query's result set. First, let's create some data to play around with:

-- Create our table with test data
DROP TABLE IF EXISTS ##Garage;
CREATE TABLE ##Garage
(
    Id int IDENTITY(1,1),
    Make varchar(100),
    BaseModel varchar(50),
    Trim varchar(50),
    Year int,
    PurchaseDate datetime2
);
INSERT INTO ##Garage VALUES ('Volkswagen', 'Golf', 'GL', 2003, '2006-10-05');
INSERT INTO ##Garage VALUES ('Subaru', 'Impreza', 'Premium', 2016, '2015-08-18');

-- Take a look at our data
SELECT * FROM ##Garage;

The data above should look pretty familiar from Part 1 of this series. It's only two rows and a handful of columns, but it should get the job done at showing how to use SQL's two new JSON creating functions.

First up is the clause FOR JSON AUTO. This clause will take the results of a query and format them into very basic JSON. Not much magic here, but it sure beats having to hardcode (or build dynamic JSON strings) using the first method outlined above.

FOR JSON AUTO does offer some formatting flexibility though as shown in the examples: nesting JSON objects is possible through joining of tables.

-- AUTO will format a result into JSON following the same structure of the result set
SELECT Make, BaseModel, Trim, Year, PurchaseDate
FROM ##Garage
FOR JSON AUTO;
-- Output: [{"Make":"Volkswagen","BaseModel":"Golf","Trim":"GL","Year":2003,"PurchaseDate":"2006-10-05T00:00:00"},{"Make":"Subaru","BaseModel":"Impreza","Trim":"Premium","Year":2016,"PurchaseDate":"2015-08-18T00:00:00"}]

-- Using aliases will rename JSON keys
SELECT Make AS [CarMake] 
FROM ##Garage 
FOR JSON AUTO;
-- Output: [{"CarMake":"Volkswagen"},{"CarMake":"Subaru"}]

-- Any joined tables will get created as nested JSON objects.  The alias of the joined tables becomes the name of the JSON key
SELECT g1.Make,  Model.BaseModel as Base, Model.Trim, g1.Year, g1.PurchaseDate
FROM ##Garage g1
INNER JOIN ##Garage Model on g1.Id = Model.Id
FOR JSON AUTO;
-- Output: [{"Make":"Volkswagen","Year":2003,"PurchaseDate":"2006-10-05T00:00:00","Model":[{"Base":"Golf","Trim":"GL"}]},{"Make":"Subaru","Year":2016,"PurchaseDate":"2015-08-18T00:00:00","Model":[{"Base":"Impreza","Trim":"Premium"}]}]

-- Finally we can encapsulate our entire JSON result in a parent element by specifiying the ROOT option
SELECT Make, BaseModel, Trim, Year, PurchaseDate
FROM ##Garage
FOR JSON AUTO, ROOT('Cars');
-- Output: {"Cars":[{"Make":"Volkswagen","BaseModel":"Golf","Trim":"GL","Year":2003,"PurchaseDate":"2006-10-05T00:00:00"},{"Make":"Subaru","BaseModel":"Impreza","Trim":"Premium","Year":2016,"PurchaseDate":"2015-08-18T00:00:00"}]}

Although FOR JSON AUTO is perfect for quick and dirty JSON string creation, SQL offers much more customization with FOR JSON PATH. TheFOR JSON PATH clause will take column aliases into consideration when building the JSON structure, making it unnecessary to have to join data in order to get a nested JSON schema.

-- PATH will format a result using dot syntax in the column aliases.  Here's an example with just default column names
SELECT Make, BaseModel, Trim, Year, PurchaseDate
FROM ##Garage
FOR JSON PATH, ROOT('Cars');
-- Output: {"Cars":[{"Make":"Volkswagen","BaseModel":"Golf","Trim":"GL","Year":2003,"PurchaseDate":"2006-10-05T00:00:00"},{"Make":"Subaru","BaseModel":"Impreza","Trim":"Premium","Year":2016,"PurchaseDate":"2015-08-18T00:00:00"}]}

-- And here is the same example, just assigning aliases to define JSON nested structure
SELECT Make, BaseModel as [Model.Base], Trim AS [Model.Trim], Year, PurchaseDate
FROM ##Garage
FOR JSON PATH, ROOT('Cars');
-- Output: {"Cars":[{"Make":"Volkswagen","Model":{"Base":"Golf","Trim":"GL"},"Year":2003,"PurchaseDate":"2006-10-05T00:00:00"},{"Make":"Subaru","Model":{"Base":"Impreza","Trim":"Premium"},"Year":2016,"PurchaseDate":"2015-08-18T00:00:00"}]}

-- We can actually go multiple levels deep with this type of alias dot notation nesting
SELECT Make, BaseModel as [Model.Base], Trim AS [Model.Trim], 'White' AS [Model.Color.Exterior], 'Black' AS [Model.Color.Interior], Year, PurchaseDate
FROM ##Garage
FOR JSON PATH, ROOT('Cars');
-- Output: {"Cars":[{"Make":"Volkswagen","Model":{"Base":"Golf","Trim":"GL","Color":{"Exterior":"White","Interior":"Black"}},"Year":2003,"PurchaseDate":"2006-10-05T00:00:00"},{"Make":"Subaru","Model":{"Base":"Impreza","Trim":"Premium","Color":{"Exterior":"White","Interior":"Black"}},"Year":2016,"PurchaseDate":"2015-08-18T00:00:00"}]}

-- Concatenating data rows with UNION or UNION ALL just adds the row as a new element as part of the JSON array
SELECT Make,  BaseModel AS [Model.Base], Trim AS [Model.Trim], Year, PurchaseDate
FROM ##Garage WHERE Id = 1
UNION ALL
SELECT Make,  BaseModel, Trim, Year, PurchaseDate
FROM ##Garage WHERE Id = 2
FOR JSON PATH, ROOT('Cars');
-- Output: {"Cars":[{"Make":"Volkswagen","Model":{"Base":"Golf","Trim":"GL"},"Year":2003,"PurchaseDate":"2006-10-05T00:00:00"},{"Make":"Subaru","Model":{"Base":"Impreza","Trim":"Premium"},"Year":2016,"PurchaseDate":"2015-08-18T00:00:00"}]}

-- We can even include our FOR JSON in our SELECT statement to generate JSON strings for each row of our result set
SELECT g1.*, (SELECT Make, BaseModel AS [Model.Base], Trim AS [Model.Trim], Year, PurchaseDate FROM ##Garage g2 WHERE g2.Id = g1.Id FOR JSON PATH, ROOT('Cars')) AS [Json]
FROM ##Garage g1
/* Output: 
Id  Make          BaseModel    Trim      Year    PurchaseDate                Json
--- ------------- ------------ --------- ------- --------------------------- --------------------------------------------------------------------------------------------------------------------------
1   Volkswagen    Golf         GL        2003    2006-10-05 00:00:00.0000000 {"Cars":[{"Make":"Volkswagen","Model":{"Base":"Golf","Trim":"GL"},"Year":2003,"PurchaseDate":"2006-10-05T00:00:00"}]}
2   Subaru        Impreza      Premium   2016    2015-08-18 00:00:00.0000000 {"Cars":[{"Make":"Subaru","Model":{"Base":"Impreza","Trim":"Premium"},"Year":2016,"PurchaseDate":"2015-08-18T00:00:00"}]}
*/

And that's it for creating JSON data in SQL Server 2016. Stay tuned over the next few weeks as we look at other SQL JSON functions that will help us modify data as well as a comparison of how SQL's JSON functions perform against other languages JSON serialization/deserialization methods.

JSON Support Is The Best New Developer Feature in SQL 2016 — Overview Part 1: Parsing JSON

Published Tue 24 January 2017 in SQL > Development > JSON

f86b8-18dn2mndvskm8-0nimg-smq

As a developer my favorite new feature of SQL Server 2016 is JSON support.

I love JSON in SQL because I already love JSON everywhere outside of SQL: it uses much less space than XML for serializing data, it's what most apps are now using for API communication, and when web developing I love that it is already valid JavaScript (no need to deserialize!).

I had this same type of excitement for XML in SQL Server, but after using it the excitement quickly turned into disappointment: having to constantly use the XML datatype was inconvenient (when most XML data I used was already stored in nvarchar(max) columns) and I never found the syntax of OPENXML() to be that intuitive.

Everything I've done with JSON in SQL Server 2016 so far has been great. I've already been storing persistent JSON in SQL, so being able to manipulate JSON within SQL is even better. In this series of posts I will go over the various functionalities of using JSON in SQL Server 2016:

Part 1 — Parsing JSON

What is JSON?

JavaScript Object Notation (JSON) " is a lightweight data-interchange format."

My simple, mostly caveat-free* explanation is that it is a format for storing object data in JavaScript. It's lightweight and easy to read, so it's used in lots of applications that aren't just JavaScript (although it's especially easy to consume in JavaScript because it is JavaScript*).

*Caveats? See http://stackoverflow.com/a/383699

So what's JSON look like? The JSON below represents the current inventory of cars in my garage. It shows I have two cars as well as some of their attributes:

{
    "Cars": [{
        "Make": "Volkswagen",
        "Model": {
            "Base": "Golf",
            "Trim": "GL"
        },
        "Year": 2003,
        "PurchaseDate": "2006-10-05T00:00:00.000Z"
    }, {
        "Make": "Subaru",
        "Model": {
            "Base": "Impreza",
            "Trim": "Premium"
        },
        "Year": 2016,
        "PurchaseDate": "2015-08-18T00:00:00.000Z"
    }]
}

Strict versus Lax mode

For any of the SQL JSON functions (OPENJSON(), JSON_VALUE(), JSON_QUERY(), andJSON_MODIFY()) you can specify whether invalid JSON paths will return NULL or an error. The default value is lax, which will return a NULL for non-existing JSON paths, whereas strict will return an error message.

-- Lax (default: function will return an error if invalid JSON path specified
SELECT JSON_VALUE('{ "Color" : "Red" }', '$.Shape') --lax is the default, so you don't need to be explicitly state it
-- Output: NULL

SELECT JSON_VALUE('{ "Color" : "Red" }', 'lax $.Shape')
-- Output: NULL

-- Strict: function will return an error if invalid JSON path specified
SELECT JSON_VALUE('{ "Color" : "Red" }', 'strict $.Shape')
-- Output: Property cannot be found on the specified JSON path.

The laxmodifier is helpful when writing queries that check to see if values exist in a JSON object while the strict modifier works great for error checking and and validation.

ISJSON()

A simple function for verifying whether an inputted string is valid JSON. This is great to use in order to validate JSON formatting before running any of remaining functions in this post.

SELECT ISJSON('{ "Color" : "Blue" }') -- Returns 1, valid
-- Output: 1

SELECT ISJSON('{ "Color" : Blue }') -- Returns 0, invalid, missing quotes
-- Output: 0

SELECT ISJSON('{ "Number" : 1 }') -- Returns 1, valid, numbers are allowed
-- Output: 1

SELECT ISJSON('{ "PurchaseDate" : "2015-08-18T00:00:00.000Z" }') -- Returns 1, valid, dates are just strings in ISO 8601 date format https://en.wikipedia.org/wiki/ISO_8601
-- Output: 1

SELECT ISJSON('{ "PurchaseDate" : 2015-08-18 }') -- Returns 0, invalid
-- Output: 0

JSON_VALUE()

Extracts a scalar value from a JSON string. This function needs to be able to parse the value, so it will not parse out complex objects like arrays.

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

SELECT JSON_VALUE(@garage, '$.Cars[0].Make') -- Return the make of the first car in our array
-- Output: Volkswagen

SELECT CAST(JSON_VALUE(@garage, '$.Cars[0].PurchaseDate') as datetime2) -- Return the Purchase Date of the first car in our array and convert it into a DateTime2 datatype
-- Output: 2006-10-05 00:00:00.0000000

SELECT JSON_VALUE(@garage, '$.Cars') -- This returns NULL because the values of Cars is an array instead of a simple object
-- Output: NULL

SELECT JSON_VALUE(@garage, '$.Cars[1].Model') -- This is also invalid because JSON_VALUE cannot return an array...only scalar values allowed!
-- Output: NULL

SELECT JSON_VALUE(@garage, '$.Cars[1].Model.Base') -- Much better
-- Output: Impreza

JSON_VALUE() is great for accessing operational data that might be using JSON to store multiple attributes for a single entry.

JSON_QUERY()

JSON_QUERY() is meant to work for all of the datatypes that JSON_VALUE() doesn't know how to return: basically JSON_QUERY() returns JSON string representations of complex JSON objects like arrays.

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

-- This returns NULL because the values of Cars is an array instead of a simple object
SELECT JSON_VALUE(@garage, '$.Cars') 
-- Output: NULL

-- Using JSON_QUERY() however returns the JSON string representation of our array object
SELECT JSON_QUERY(@garage, '$.Cars') 
-- Output: [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }]

-- This instance of JSON_VALUE() correctly returns a singular scalar value
SELECT JSON_VALUE(@garage, '$.Cars[0].Make')
-- Output: Volkswagen

-- Using JSON_QUERY will not work for returning scalar values - it only will return JSON strings for complex objects
SELECT JSON_QUERY(@garage, '$.Cars[0].Make')
-- Output: NULL

It's possible to use JSON_QUERY() along with JSON_VALUE() to essentially extract any type of data from JSON, whether it's a simple or complex object datatype.

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

-- We use JSON_QUERY to get the JSON representation of the Cars array
SELECT JSON_QUERY(@garage, '$.Cars')
-- Output: [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }]

-- If we combine it with JSON_VALUE we can then pull out specific scalar values
SELECT JSON_VALUE(JSON_QUERY(@garage, '$.Cars') , '$[0].Make')
-- Output: Volkswagen

OPENJSON()

While JSON_VALUE() extracts singular scalar values and JSON_QUERY() extracts singular JSON strings, the OPENJSON() function extracts result sets from a JSON string. In addition to the extracted value, OPENJSON() outputs the order of JSON objects as well as their datatypes. OPENJSON() will also output string representations of JSON arrays instead of just displaying NULL, similar to JSON_QUERY().

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

SELECT * FROM OPENJSON(@garage, '$.Cars') -- Displaying the values of our "Cars" array.  We additionally get the order of the JSON objects outputted in the "key" column and the JSON object datatype in the "type" column
/* Output:
key    value                                                                                                                                type
------ ------------------------------------------------------------------------------------------------------------------------------------ ----
0      { "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }        5
1      { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }    5
*/

SELECT * FROM OPENJSON(@garage, '$.Cars[0]') -- Specifying the first element in our JSON array.  JSON arrays are zero-index based
/* Output:
key              value                                 type
---------------- ------------------------------------- ----
Make             Volkswagen                            1
Model            { "Base": "Golf", "Trim": "GL" }      5
Year             2003                                  2
PurchaseDate     2006-10-05T00:00:00.000Z              1
*/

SELECT * FROM OPENJSON(@garage, '$.Cars[0].Model') -- Pulling the Model property from the first element in our Cars array
/* Output:
key     value   type
------- ------- ----
Base    Golf    1
Trim    GL      1
*/

The flexibility of OPENJSON() makes it possible to extract any values from JSON data, especially when combining OPENJSON() with JSON_VALUE(). The examples below show how to parse out a scalar value from complex JSON objects (like arrays). Note that using the WITH option gives us a lot more flexibility with how we can format our output result.

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

-- Here we retrieve the Make of each vehicle in our Cars array
SELECT JSON_VALUE(value, '$.Make') FROM OPENJSON(@garage, '$.Cars') 
/* Output: 
------------
Volkswagen
Subaru
*/ 

-- Parsing and converting some JSON dates to SQL DateTime2
SELECT CAST(JSON_VALUE(value, '$.PurchaseDate') as datetime2) FROM OPENJSON(@garage, '$.Cars') 
/* Output: 
---------------------------
2006-10-05 00:00:00.0000000
2015-08-18 00:00:00.0000000
*/ 

-- We can also format the output schema of a JSON string using the WITH option.  This is especially cool because we can bring up values from sub-arrays (see Model.Base and Model.Trim) to our top-level row result
SELECT * FROM OPENJSON(@garage, '$.Cars')
 WITH (Make varchar(20) 'strict $.Make',  
       ModelBase nvarchar(100) '$.Model.Base',
       ModelTrim nvarchar(100) '$.Model.Trim',
        Year int '$.Year',  
       PurchaseDate datetime2 '$.PurchaseDate') 
/* Output: 
Make           ModelBase   Year        PurchaseDate
-------------- ----------- ----------- ---------------------------
Volkswagen     Golf        2003        2006-10-05 00:00:00.0000000
Subaru         Impreza     2016        2015-08-18 00:00:00.0000000
*/

These JSON functions should help you parse any JSON data you encounter in SQL server (as long as it's valid and stored as nvarchar). Stay tuned over the next few weeks as we look at other SQL JSON functions that will help us create data, modify data, and compare SQL's JSON performance to other languages.

XmlReader vs XmlDocument Performance

Published Thu 24 November 2016 in Programming

306d1-15myowyb7a3ye9kbyzjdttg

Recently I have been working on a project where I needed to parse XML files that were between 5mb and 20mb in size. Performance was critical for the project, so I wanted to make sure that I would parse these files as quickly as possible.

The two C# classes that I know of for parsing XML are XmlReader and XmlDocument. Based on my understanding of the two classes, XmlReader should perform faster in my scenario because it reads through an XML document only once, never storing more than the current node in memory. On the contrary, XmlDocument stores the whole XML file in memory which has some performance overhead.

Not knowing for certain which method I should use, I decided to write a quick performance test to measure the actual results of these two classes.

The Data

In my project, I knew what data I needed to extract from the XML up front so I decided to configure test in a way that mimics that requirement. If my project required me to run recursive logic in the XML document, needing a piece of information further down in the XML in order to know what pieces of information to pull earlier on from the XML, I would have set up an entirely different test.

For my test, I decided to use the Photography Stack Exchange user data dump as our sample file since it mimics the structure and file size of one my actual project's data. The Stack Exchange data dumps are great sample data sets because they involve real-world data and are released under a Creative Commons license.

The Test

The C# code for my test can be found in its entirety on GitHub.

In my test I created two methods to extract the same exact data from the XML; one of the methods used XmlReader and the other XmlDocument.

The first test uses XmlReader. The XmlReader object only stores a single node in memory at a time, so in order to read through the whole document we need to usewhile(reader.Read()) in order to loop all of the nodes. Inside of the loop, we check if each node is an element that we are looking for and if so then parse out the necessary data:

public static void XmlReaderTest(string filePath)
{
    // We create storage for ids of all of the rows from users where reputation == 1
    List<string> singleRepRowIds = new List<string>();

    using (XmlReader reader = XmlReader.Create(filePath))
    {
        while (reader.Read())
        {
            if (reader.IsStartElement())
            {
                if (reader.Name == "row" && reader.GetAttribute("Reputation") == "1")
                {
                    singleRepRowIds.Add(reader.GetAttribute("Id"));
                }
            }
        }
    }
}

On the other hand, the code for XmlDocument is much simpler: we load the whole XML file into memory and then write a LINQ query to find the elements of interest:

public static void XmlDocumentTest(string filePath)
{
    List<string> singleRepRowIds = new List<string>();

    XmlDocument doc = new XmlDocument();
    doc.Load(filePath);

    singleRepRowIds = doc.GetElementsByTagName("row").Cast<XmlNode>().Where(x => x.Attributes["Reputation"].InnerText == "1").Select(x => x.Attributes["Id"].InnerText).ToList();
}

After writing these two methods and confirming that they are returning the same exact results it was time to pit them against each other. I wrote a method to run each of my two XML parsing methods above 50 times and to take the average elapsed run time of each to eliminate any outlier data:

public static double RunPerformanceTest(string filePath, Action<string> performanceTestMethod)
{
    Stopwatch sw = new Stopwatch();

    int iterations = 50;
    double elapsedMilliseconds = 0;

    // Run the method 50 times to rule out any bias.
    for (var i = 0; i < iterations; i++)
    {
        sw.Restart();
        performanceTestMethod(filePath);
        sw.Stop();

        elapsedMilliseconds += sw.ElapsedMilliseconds;
    }

    // Calculate the average elapsed seconds per run
    double avergeSeconds = (elapsedMilliseconds / iterations) / 1000.0;

    return avergeSeconds;
}

Results and Conclusions

Cutting to the chase, XmlReader performed faster in my test:

Performance test results.

Now, is this ~.14 seconds of speed difference significant? In my case, it is, because I will be parsing many more elements and many more files dozens of times a day. After doing the math, I estimate I will save 45–60 seconds of parsing time for each set of XML files, which is huge in an almost-real-time system.

Would I have come to the same conclusion if blazing fast speed was not one of my requirements? No, I would probably go the XmlDocument route because the code is much cleaner and therefore easier to maintain.

And if my XML files were 50mb, 500mb, or 5gb in size? I would probably still use XmlReader at that point because trying to store 5gb of data in memory will not be pretty.

What about a scenario where I need to go backwards in my XML document — this might be a case where I would use XmlDocument because it is more convenient to go backwards and forwards with that class. However, a hybrid approach might be my best option if the data allows it: if I can use XmlReader to get through the bulk of my content quickly and then load just certain child trees of elements into XmlDocument for easier backwards/forwards traversal, then that would seem like an ideal scenario.

In short, XmlReader was faster than XmlDocumet for me in my scenario. The only way I could come to this conclusion though was by running some real world tests and measuring the performance data.

So should you use XmlReader or XmlDocument in your next project? The answer is it depends.

Coffee's Variety Problem

Published Mon 14 November 2016 in Programming

Freshly roasted coffee beans from central Mexico.

Spaghetti Sauce Origins

During the 1970s consumers had a limited number of spaghetti sauces that they could purchase at their local supermarket. Each store-bought sauce tasted the same, developed in test kitchens from the average flavor preferences of focus groups.

During this time, Prego brand spaghetti sauce was new to the market and was having a difficult time competing against the more established spaghetti sauce brands like Ragu. Prego tasted the same as all of the other brands so capturing loyal customers from other brands was a challenge.

Struggling to become competitive, Prego brought on board Howard Moskowitz. Today some consider Moskowitz the godfather of food-testing and market research, but during the 1970s Moskowitz's beliefs on product-design were contrary to the mainstream establishment. Moskowitz believed that using the average preferences of focus groups to develop a product lead to an average tasting spaghetti sauce: acceptable by most, loved by none.

Instead of giving focus groups similar tasting sauces, Moskowitz decided to test the extremes: have groups compare smooth pureed sauces with sauces that have chunks of vegetables in them, compare mild tasting sauces with ones with lots of spice, and so forth.

The results of this type of testing may seem obvious today, but at the time they were unheard of: people prefer different kinds of spaghetti sauce. It was better for a food company to have a portfolio of niche flavors rather than try to make one product that appealed to everybody.

Today spaghetti sauce no longer has a variety problem.

After all of this initial research Prego made Extra-Chunky spaghetti sauce, which instantly became a hit with all of the people who prefer a thicker type of tomato sauce. In the years since, spaghetti sauce manufactures have caught on to the technique and supermarket shelves today are lined with regular, thick and chunky, three-cheese, garlic, and many other types of sauces that appeal to a wide-array of consumer flavor preferences.

Coffee's Variety Problem

Coffee today has the same problem that spaghetti sauce did a quarter-century ago.

The average supermarket's coffee aisle may look like it has a wide variety of choices: whole beans versus ground coffee versus coffee pods, arabica versus robusta beans, caffeinated versus decaf, medium versus dark roast, etc…

However, once you take a look at what coffee a particular individual may drink — let's say whole bean, medium-roast caffeinated arabica beans -the amount of variety found at a super market is surprisingly small, maybe only 2 or 3 different types.

The limited selection for whole bean, medium roasted coffee at my local super market.

Additionally, it is nearly impossible to know how long the coffee has been sitting unsold on the shelves, it's difficult to find a variety of coffee from different geographic regions around the world, and only if you are extremely lucky can you find a coffee that has been lightly roasted.

Independent coffee shops, especially those that roast their own coffee, offer slightly better options in terms of variety, however they come at a steep price: it is not unusual to see these coffees selling for \$16-\$28 per pound.

I understand these small-batch roasters experience higher costs due to lack of scale, availability of beans, and a slowly developing market to third-wave premium coffee, however paying \$20 for a pound of coffee beans (or even worse, the standard 12oz bag) is not something I can justify doing regularly.

This is what caused me to set out on my quest to get premium quality coffee beans at an affordable price.

The Internet Cafe

While the internet offers advantages in buying premium roasted coffee beans, there are still issues with unknown freshness and high prices, especially when the cost of shipping is included.

Shipping costs and price per pound can be reduced when buying in bulk, but buying in bulk means I'll have roasted beans sitting around for a long time before being consumed, therefore affecting freshness and taste. Short of finding friends who want to split a large coffee order, buying roasted coffee beans online isn't a great option.

What is a great option is buying green, or un-roasted, coffee beans. The shelf life of green beans is up to one year if stored properly and green beans are significantly cheaper than roasted beans because there is less processing involved.

Green, un-roasted coffee beans.

Not only are green beans fresher and cheaper, there is significantly more variety available. Commercial roasters need to buy beans in large quantities in order to be able to sell to coffee shops and supermarkets. This means they are sourcing coffee beans from large commercial farms that are able to supply such a large amount of coffee beans.

If we buy green beans for personal consumption at home, the number of farms that we can buy from is hugely expanded since we only need to buy in small quantities. A farm that only produces a few hundred pounds of beans each year is now within our grasp since commercial roasters would never be able to purchase from them.

There are many retailers online that cater to the home roasting market. My favorite is http://sweetmarias.com and they always have a huge selection of premium beans from all over the world, many for under \$6/pound.

Home Roasting

Roasting beans at home used to be the norm in the early nineteenth century. Roasting coffee beans is similar to making popcorn and can be done over a fire, in a stove, or in the oven. While these methods work, they are messy and involved. Fortunately, cleaner and more scientific options exist.

Commercially available home roasters are one such option, however they cost several hundred to thousands of dollars — well out of my price range.

The most easily obtainable and easy to use home coffee roaster is an electric air-powered popcorn popper. These retail for between \$15–\$25 new. If you decide to try this route, get one without a mesh screen on the bottom.

Once you have an air popcorn popper, roasting coffee beans is as easy as pouring them in, turning on the heat, and waiting until they turn the brown color you are accustomed to seeing.

Although roasting beans can be as easy as turning on the popcorn popper and waiting until the beans reach a desired color, there is a scientific rabbit hole that roasting geeks like me eventually wander down…

Home Roasting 2.0: Web Roast

When I first started roasting coffee beans at home, I started with the air popper method. I soon wanted to start experimenting more with how to make the process more automated, as well as how I could become more precise and play with different variables in order to change the flavors of my roasted coffee.

There are some mods you can make to a regular air popcorn popper to give you more control in how your roast your beans, but ultimately I wanted more control.

I present to you, Web Roast:

The home-made, Internet of Things (IoT) coffee roaster.

The finished project with all code can be found on my project's GitHub page: https://github.com/bertwagner/Coffee-Roaster/

Essentially, this is still an air popper but with much more control. The key features include:

Air temperature probe
Ability to switch the fan and heat source on/off independently
Holding a constant temperature
Automatic graphing of roast profiles
Ability to run saved roast profiles

Now instead of standing at my kitchen counter turning the air popper on and off, holding a digital temperature probe, and shaking the whole roaster to circulate and cool the beans between "on" cycles, I can simply control all of these conditions from my iPhone.

Screen shot of the web app user interface.

From my phone I decide when to heat the beans, when to maintain a certain temperature, how quickly or slowly to hit certain roasting stages or "cracks", and logging to make sure I can reproduce results in future runs.

Beans develop different flavors based on how quickly or slowly they go through different phases of roasting. Some beans might be better suited for quick roasts that will maintain acidic fruit flavors. Other beans might need to be roasted more slowly to bring out nutty and cocoa flavors. The moisture content of a bean will also have an effect on roasting times, as well as beans that are sourced from different regions of the world.

Next Steps

Although I'm extremely satisfied with how my roaster has turned out, there's still a lot on the to-do list.

I'm currently adding functionality to save roast profiles, so after an initial run of desired results, reproducing those results for the same batch of green beans is easy.

In the future, I'd like to build a second, bigger drum-style roaster for being able to roast larger batches at a time.

Follow my Github coffee roaster project page to keep up with any future updates. Also I would love to hear from anyone who has built similar projects of their own.

Good luck and happy roasting!