Data with Bert logo

JSON Support Is The Best New Developer Feature in SQL 2016 — Overview Part 1: Parsing JSON

f86b8-18dn2mndvskm8-0nimg-smq

As a developer my favorite new feature of SQL Server 2016 is JSON support.

I love JSON in SQL because I already love JSON everywhere outside of SQL: it uses much less space than XML for serializing data, it's what most apps are now using for API communication, and when web developing I love that it is already valid JavaScript (no need to deserialize!).

I had this same type of excitement for XML in SQL Server, but after using it the excitement quickly turned into disappointment: having to constantly use the XML datatype was inconvenient (when most XML data I used was already stored in nvarchar(max) columns) and I never found the syntax of OPENXML() to be that intuitive.

Everything I've done with JSON in SQL Server 2016 so far has been great. I've already been storing persistent JSON in SQL, so being able to manipulate JSON within SQL is even better. In this series of posts I will go over the various functionalities of using JSON in SQL Server 2016:

Part 1 — Parsing JSON

What is JSON?

JavaScript Object Notation (JSON) " is a lightweight data-interchange format."

My simple, mostly caveat-free* explanation is that it is a format for storing object data in JavaScript. It's lightweight and easy to read, so it's used in lots of applications that aren't just JavaScript (although it's especially easy to consume in JavaScript because it is JavaScript*).

*Caveats? See http://stackoverflow.com/a/383699

So what's JSON look like? The JSON below represents the current inventory of cars in my garage. It shows I have two cars as well as some of their attributes:

{
    "Cars": [{
        "Make": "Volkswagen",
        "Model": {
            "Base": "Golf",
            "Trim": "GL"
        },
        "Year": 2003,
        "PurchaseDate": "2006-10-05T00:00:00.000Z"
    }, {
        "Make": "Subaru",
        "Model": {
            "Base": "Impreza",
            "Trim": "Premium"
        },
        "Year": 2016,
        "PurchaseDate": "2015-08-18T00:00:00.000Z"
    }]
}

Strict versus Lax mode

For any of the SQL JSON functions (OPENJSON(), JSON_VALUE(), JSON_QUERY(), andJSON_MODIFY()) you can specify whether invalid JSON paths will return NULL or an error. The default value is lax, which will return a NULL for non-existing JSON paths, whereas strict will return an error message.

-- Lax (default: function will return an error if invalid JSON path specified
SELECT JSON_VALUE('{ "Color" : "Red" }', '$.Shape') --lax is the default, so you don't need to be explicitly state it
-- Output: NULL

SELECT JSON_VALUE('{ "Color" : "Red" }', 'lax $.Shape')
-- Output: NULL

-- Strict: function will return an error if invalid JSON path specified
SELECT JSON_VALUE('{ "Color" : "Red" }', 'strict $.Shape')
-- Output: Property cannot be found on the specified JSON path.

The laxmodifier is helpful when writing queries that check to see if values exist in a JSON object while the strict modifier works great for error checking and and validation.

ISJSON()

A simple function for verifying whether an inputted string is valid JSON. This is great to use in order to validate JSON formatting before running any of remaining functions in this post.

SELECT ISJSON('{ "Color" : "Blue" }') -- Returns 1, valid
-- Output: 1

SELECT ISJSON('{ "Color" : Blue }') -- Returns 0, invalid, missing quotes
-- Output: 0

SELECT ISJSON('{ "Number" : 1 }') -- Returns 1, valid, numbers are allowed
-- Output: 1

SELECT ISJSON('{ "PurchaseDate" : "2015-08-18T00:00:00.000Z" }') -- Returns 1, valid, dates are just strings in ISO 8601 date format https://en.wikipedia.org/wiki/ISO_8601
-- Output: 1

SELECT ISJSON('{ "PurchaseDate" : 2015-08-18 }') -- Returns 0, invalid
-- Output: 0

JSON_VALUE()

Extracts a scalar value from a JSON string. This function needs to be able to parse the value, so it will not parse out complex objects like arrays.

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

SELECT JSON_VALUE(@garage, '$.Cars[0].Make') -- Return the make of the first car in our array
-- Output: Volkswagen

SELECT CAST(JSON_VALUE(@garage, '$.Cars[0].PurchaseDate') as datetime2) -- Return the Purchase Date of the first car in our array and convert it into a DateTime2 datatype
-- Output: 2006-10-05 00:00:00.0000000

SELECT JSON_VALUE(@garage, '$.Cars') -- This returns NULL because the values of Cars is an array instead of a simple object
-- Output: NULL

SELECT JSON_VALUE(@garage, '$.Cars[1].Model') -- This is also invalid because JSON_VALUE cannot return an array...only scalar values allowed!
-- Output: NULL

SELECT JSON_VALUE(@garage, '$.Cars[1].Model.Base') -- Much better
-- Output: Impreza

JSON_VALUE() is great for accessing operational data that might be using JSON to store multiple attributes for a single entry.

JSON_QUERY()

JSON_QUERY() is meant to work for all of the datatypes that JSON_VALUE() doesn't know how to return: basically JSON_QUERY() returns JSON string representations of complex JSON objects like arrays.

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

-- This returns NULL because the values of Cars is an array instead of a simple object
SELECT JSON_VALUE(@garage, '$.Cars') 
-- Output: NULL

-- Using JSON_QUERY() however returns the JSON string representation of our array object
SELECT JSON_QUERY(@garage, '$.Cars') 
-- Output: [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }]

-- This instance of JSON_VALUE() correctly returns a singular scalar value
SELECT JSON_VALUE(@garage, '$.Cars[0].Make')
-- Output: Volkswagen

-- Using JSON_QUERY will not work for returning scalar values - it only will return JSON strings for complex objects
SELECT JSON_QUERY(@garage, '$.Cars[0].Make')
-- Output: NULL

It's possible to use JSON_QUERY() along with JSON_VALUE() to essentially extract any type of data from JSON, whether it's a simple or complex object datatype.

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

-- We use JSON_QUERY to get the JSON representation of the Cars array
SELECT JSON_QUERY(@garage, '$.Cars')
-- Output: [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }]

-- If we combine it with JSON_VALUE we can then pull out specific scalar values
SELECT JSON_VALUE(JSON_QUERY(@garage, '$.Cars') , '$[0].Make')
-- Output: Volkswagen

OPENJSON()

While JSON_VALUE() extracts singular scalar values and JSON_QUERY() extracts singular JSON strings, the OPENJSON() function extracts result sets from a JSON string. In addition to the extracted value, OPENJSON() outputs the order of JSON objects as well as their datatypes. OPENJSON() will also output string representations of JSON arrays instead of just displaying NULL, similar to JSON_QUERY().

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

SELECT * FROM OPENJSON(@garage, '$.Cars') -- Displaying the values of our "Cars" array.  We additionally get the order of the JSON objects outputted in the "key" column and the JSON object datatype in the "type" column
/* Output:
key    value                                                                                                                                type
------ ------------------------------------------------------------------------------------------------------------------------------------ ----
0      { "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }        5
1      { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }    5
*/

SELECT * FROM OPENJSON(@garage, '$.Cars[0]') -- Specifying the first element in our JSON array.  JSON arrays are zero-index based
/* Output:
key              value                                 type
---------------- ------------------------------------- ----
Make             Volkswagen                            1
Model            { "Base": "Golf", "Trim": "GL" }      5
Year             2003                                  2
PurchaseDate     2006-10-05T00:00:00.000Z              1
*/

SELECT * FROM OPENJSON(@garage, '$.Cars[0].Model') -- Pulling the Model property from the first element in our Cars array
/* Output:
key     value   type
------- ------- ----
Base    Golf    1
Trim    GL      1
*/

The flexibility of OPENJSON() makes it possible to extract any values from JSON data, especially when combining OPENJSON() with JSON_VALUE(). The examples below show how to parse out a scalar value from complex JSON objects (like arrays). Note that using the WITH option gives us a lot more flexibility with how we can format our output result.

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

-- Here we retrieve the Make of each vehicle in our Cars array
SELECT JSON_VALUE(value, '$.Make') FROM OPENJSON(@garage, '$.Cars') 
/* Output: 
------------
Volkswagen
Subaru
*/ 

-- Parsing and converting some JSON dates to SQL DateTime2
SELECT CAST(JSON_VALUE(value, '$.PurchaseDate') as datetime2) FROM OPENJSON(@garage, '$.Cars') 
/* Output: 
---------------------------
2006-10-05 00:00:00.0000000
2015-08-18 00:00:00.0000000
*/ 

-- We can also format the output schema of a JSON string using the WITH option.  This is especially cool because we can bring up values from sub-arrays (see Model.Base and Model.Trim) to our top-level row result
SELECT * FROM OPENJSON(@garage, '$.Cars')
 WITH (Make varchar(20) 'strict $.Make',  
       ModelBase nvarchar(100) '$.Model.Base',
       ModelTrim nvarchar(100) '$.Model.Trim',
        Year int '$.Year',  
       PurchaseDate datetime2 '$.PurchaseDate') 
/* Output: 
Make           ModelBase   Year        PurchaseDate
-------------- ----------- ----------- ---------------------------
Volkswagen     Golf        2003        2006-10-05 00:00:00.0000000
Subaru         Impreza     2016        2015-08-18 00:00:00.0000000
*/

These JSON functions should help you parse any JSON data you encounter in SQL server (as long as it's valid and stored as nvarchar). Stay tuned over the next few weeks as we look at other SQL JSON functions that will help us create data, modify data, and compare SQL's JSON performance to other languages.

Present Like Nobody's Watching

Presenting online — more like a one-on-one discussion than a presentation in front of hundreds of people

Last week I presented my session High Performance SSRS at the inaugural GroupBy Conference. It was an incredibly fun experience getting to present a topic I'm excited about as well seeing others do the same at such a high caliber.

The thing that makes GroupBy different from other technical conferences is that it's free, it's community driven, and it's online-only. This makes it a very inclusive conference since barriers to attend are extremely low. The only significant barrier to watching the sessions live was a software restriction to the amount of live attendees. Fortunately, all of the sessions were made available on the GroupBy YouTube channel soon afterwards to allow anybody to watch the sessions at any time.

The 1,000 live attendee registration cap was quickly reached

Even though the conference is only half-over (more excellent presentations coming this Friday January 20 — go register!), my session is complete so I thought it would be fun to go over what made presenting at GroupBy different from an ordinary in-person conference.

Pre-Conference

There wasn't too much of a difference preparing for GroupBy compared to other speaking engagements. I still had to create a presentation that captured the audience's attention and told a good story. I also had a lot of rehearsing to do, especially since my presentation consisted of mostly live demos. In case you've never presented something with live demos, here's a secret: something always goes wrong. This means a lot of my preparation involved thinking about what could go wrong and trying to prepare for it so I wouldn't embarrass myself too badly in front of a live audience.

The main difference between this talk and others I've given is that I was entirely on my own in regards to audio and video setup.

Broadcasting from my kitchen on conference day

Having a both an audio (musician) and video (photographer) background, I probably went a little overboard with my setup. During my practice recording sessions, I kept finding little details that annoyed me and forced me to get to the setup that you see above. Ugly background? Move to the kitchen where there is a solid white wall. Bad lighting? Turn on all of the lights and bring in extra lamps. Laptop microphone sounds muffled? Pull out the condenser and boom stand.

Conference Day

Since I didn't have to travel anywhere to speak, I just went about my normal day and went in to work. We booked a conference room (zing!) and had a watch party, ate some pizza, and learned about SQL Server 2016 built-in performance boosts, automation with PowerShell, SQL at StackOverflow, SQL Server 2016 hidden gems, what not to do with SQL, and Power BI. During the lunch break, I went home to get ready for my session.

My actual session went well and was strangely comfortable to present since I got to do it in my home, in my regular programming chair, 3 feet away from my regular programming fridge (need caffeine from somewhere right?).

The one daunting aspect of my presentation was that it was difficult getting a sense of how the audience was following along. Although Brent was great at laughing at my jokes, commenting, and moderating audience questions, it was hard for me to judge if I needed to speed up sections or slow down and explain in more detail. With a live audience, I can read body language and adjust. Online, it felt like nobody is watching.

Post-Conference

People were watching however, because soon after my presentation was over I got flooded with texts, emails, and comments about my session. Session ratings flowed in over the next few days and I was happy to get feedback indicating that my session was well received.

And although there were no speaker dinners or sponsored after parties, Renee and I made up for it by going to our favorite sushi bar to celebrate!

Presenting in the virtual world was definitely a very different experience for me but I loved it. Would I do it again? No doubt, and I hope you do too (abstract deadline is February 28 — get typing!).

The Most Lacking Feature In SSRS (7 years and no fix?)

57615-0trztfz8w8a9bhszp

This post is a response to this month's T-SQL Tuesday prompt. T-SQL Tuesday was created by Adam Machanic and is a way for SQL users to share ideas about interesting topics. This month's topic is SQL Server Bugs & Enhancement Requests.


Being a frequent user of SQL Server Reporting Services for the past 6 years has made me blind to the annoyances that I had with the software when I first started using it. For example, I've learned to deal with not being able to easily change the order that reports appear in the Solution Explorer. I am no longer frustrated that I can't change the order of datasets in the Report Data pane. I've grown accustomed to editing the XML of the report to accomplish these annoyances and many others.

Even though I think SSRS is a superb piece of software (no other tool I know of can generate reports so easily with as much flexibility as SSRS) there is one lacking feature that still drives me nuts:

Why can't I organize my reports in sub folders!?

Don't be fooled by my photo manipulating skills, this New Folder button doesn't actually exist in SSRS.

Seriously. Visual Studio allows me to organize my files in sub folders in nearly all other project types: C# console apps? Check. ASP.NET MVC solutions? Check.

Why then can't my SSRS solutions do the same thing?

If you look at the Microsoft Connect for the issue more than 200 people agree. It's ridiculous this functionality isn't built in. Not only does Visual Studio have the capability in other solutions, but reports can be deployed to multiple different folders on the SSRS server itself, leaving the only missing link a context menu action that says "Create New Folder." I know what I'm asking for here is a "basic" change, nothing nearly as complicated as adding an additional QUALIFY filtering clause (which would be great to have too), but that just makes it more reason that this should have been fixed a long time ago!

However, there is a brief glimmer of hope. Microsoft has been releasing updates to SQL Server Data Tools somewhat regularly the past year, including bug fixes and feature requests from Connect feedback. Let's hope they continue to get better about fixing issues like these so that everyone will be able to right click > Create New Folder in their SSRS projects sometime in the near future.

Epic Life Quests

ff87c-0a7-0aztlscfuxve_

It is easy to get caught up in the daily details of life and not take the time to reflect on longer term goals and accomplishments.

Inspired by Brent Ozar and Steve Kamb, these Epic Life Quests are intended to help me reflect on my accomplishments and help me stay focused on the things that are important.

Each level contains five achievements and once all are completed I can "level up" to the next five. Follow along and let me know if you create any epic life quests of your own.

Level 2 (currently working on)

  • Blog weekly for 6 months straight — Last year I began blogging more than any previous year, but I didn't always stick to a schedule. My biggest problem was I didn't know what I wanted to write about so choosing topics was difficult and frustrating. After looking back at what posts were the most well-received, I've decided to focus the first half of 2017 to mostly technical and professional development type topics.
  • Speak at a technical conference — I've been presenting technical content in small user group type settings for years but this year I want to try my hand at some larger audiences. Completed January 2017 at the GroupBy conference.
  • Vacation in Hawaii — Our vacations in 2016 focused on places we could reach by car so that we could save some money for a larger trip. This will be the bigger trip. After visiting Hawaii, I will have visited 36 states + Washington D.C (airports don't count!).
  • Work on mental mindfulness— practice meditation to improve focus, patience, manage stress, be happier. I want to average at least 5 days/week for 3 months to reach this goal.
  • Always be reading at least one book — Although I read 40+ books in 2016, there were stretches of weeks at a time where I was not reading anything. For 6 months I don't want to go more than 3 days without having picked a book to have available to read.

Level 1 Quests (completed before 2017)

Here are some of my achievements before I started this page on January 1, 2017.

  • Set up an environment for programming regularly at home — completed 2016
  • Start using GitHubt for my coding projects — completed 2016
  • Built a hardware project that gets regular use (not just proof of concept) — completed 2016
  • Buy a house — completed 2015
  • Get into backpacking — completed 2015
  • Take a car trip in Europe with Renee — completed 2015
  • Learn to read books for pleasure — completed 2014

XmlReader vs XmlDocument Performance

306d1-15myowyb7a3ye9kbyzjdttg

Recently I have been working on a project where I needed to parse XML files that were between 5mb and 20mb in size. Performance was critical for the project, so I wanted to make sure that I would parse these files as quickly as possible.

The two C# classes that I know of for parsing XML are XmlReader and XmlDocument. Based on my understanding of the two classes, XmlReader should perform faster in my scenario because it reads through an XML document only once, never storing more than the current node in memory. On the contrary, XmlDocument stores the whole XML file in memory which has some performance overhead.

Not knowing for certain which method I should use, I decided to write a quick performance test to measure the actual results of these two classes.

The Data

In my project, I knew what data I needed to extract from the XML up front so I decided to configure test in a way that mimics that requirement. If my project required me to run recursive logic in the XML document, needing a piece of information further down in the XML in order to know what pieces of information to pull earlier on from the XML, I would have set up an entirely different test.

For my test, I decided to use the Photography Stack Exchange user data dump as our sample file since it mimics the structure and file size of one my actual project's data. The Stack Exchange data dumps are great sample data sets because they involve real-world data and are released under a Creative Commons license.

The Test

The C# code for my test can be found in its entirety on GitHub.

In my test I created two methods to extract the same exact data from the XML; one of the methods used XmlReader and the other XmlDocument.

The first test uses XmlReader. The XmlReader object only stores a single node in memory at a time, so in order to read through the whole document we need to usewhile(reader.Read()) in order to loop all of the nodes. Inside of the loop, we check if each node is an element that we are looking for and if so then parse out the necessary data:

public static void XmlReaderTest(string filePath)
{
    // We create storage for ids of all of the rows from users where reputation == 1
    List<string> singleRepRowIds = new List<string>();

    using (XmlReader reader = XmlReader.Create(filePath))
    {
        while (reader.Read())
        {
            if (reader.IsStartElement())
            {
                if (reader.Name == "row" && reader.GetAttribute("Reputation") == "1")
                {
                    singleRepRowIds.Add(reader.GetAttribute("Id"));
                }
            }
        }
    }
}

On the other hand, the code for XmlDocument is much simpler: we load the whole XML file into memory and then write a LINQ query to find the elements of interest:

public static void XmlDocumentTest(string filePath)
{
    List<string> singleRepRowIds = new List<string>();

    XmlDocument doc = new XmlDocument();
    doc.Load(filePath);

    singleRepRowIds = doc.GetElementsByTagName("row").Cast<XmlNode>().Where(x => x.Attributes["Reputation"].InnerText == "1").Select(x => x.Attributes["Id"].InnerText).ToList();
}

After writing these two methods and confirming that they are returning the same exact results it was time to pit them against each other. I wrote a method to run each of my two XML parsing methods above 50 times and to take the average elapsed run time of each to eliminate any outlier data:

public static double RunPerformanceTest(string filePath, Action<string> performanceTestMethod)
{
    Stopwatch sw = new Stopwatch();

    int iterations = 50;
    double elapsedMilliseconds = 0;

    // Run the method 50 times to rule out any bias.
    for (var i = 0; i < iterations; i++)
    {
        sw.Restart();
        performanceTestMethod(filePath);
        sw.Stop();

        elapsedMilliseconds += sw.ElapsedMilliseconds;
    }

    // Calculate the average elapsed seconds per run
    double avergeSeconds = (elapsedMilliseconds / iterations) / 1000.0;

    return avergeSeconds;
}

Results and Conclusions

Cutting to the chase, XmlReader performed faster in my test:

Performance test results.

Now, is this ~.14 seconds of speed difference significant? In my case, it is, because I will be parsing many more elements and many more files dozens of times a day. After doing the math, I estimate I will save 45–60 seconds of parsing time for each set of XML files, which is huge in an almost-real-time system.

Would I have come to the same conclusion if blazing fast speed was not one of my requirements? No, I would probably go the XmlDocument route because the code is much cleaner and therefore easier to maintain.

And if my XML files were 50mb, 500mb, or 5gb in size? I would probably still use XmlReader at that point because trying to store 5gb of data in memory will not be pretty.

What about a scenario where I need to go backwards in my XML document — this might be a case where I would use XmlDocument because it is more convenient to go backwards and forwards with that class. However, a hybrid approach might be my best option if the data allows it: if I can use XmlReader to get through the bulk of my content quickly and then load just certain child trees of elements into XmlDocument for easier backwards/forwards traversal, then that would seem like an ideal scenario.

In short, XmlReader was faster than XmlDocumet for me in my scenario. The only way I could come to this conclusion though was by running some real world tests and measuring the performance data.

So should you use XmlReader or XmlDocument in your next project? The answer is it depends.