The Ultimate SQL Server JSON Cheat Sheet

Examples for handling JSON data in SQL Server 2016+

This post is a reference of my examples for processing JSON data in SQL Server. For more detailed explanations of these functions, please see my post series on JSON in SQL Server 2016:

Additionally, the complete reference for SQL JSON handling can be found at MSDN: https://msdn.microsoft.com/en-us/library/dn921897.aspx


Parsing JSON

Getting string JSON data into a SQL readable form.

ISJSON()

Checks to see if the input string is valid JSON.

JSON_VALUE()

Extracts a specific scalar string value from a JSON string using JSON path expressions.

Strict vs. Lax mode

If the JSON path cannot be found, determines if the function should return a NULL or an error message.

JSON_QUERY()

Returns a JSON fragment for the specified JSON path.

This is useful to help filter an array and then extract values with JSON_VALUE():

OPEN_JSON()

Returns a SQL result set for the specified JSON path. The result set includes columns identifying the datatypes of the parsed data.

Creating JSON

Creating JSON data from either strings or result sets.

FOR JSON AUTO

Automatically creates a JSON string from a SELECT statement. Quick and dirty.

FOR JSON PATH

Formats a SQL query into a JSON string, allowing the user to define structure and formatting.

Modifying JSON

Updating, adding to, and deleting from JSON data.

JSON_MODIFY()

Allows the user to update properties and values, add properties and values, and delete properties and values (the delete is unintuitive, see below).

Modify:

Add:

Delete property:

Delete from array (this is not intuitive, see my Microsoft Connect item to fix this: https://connect.microsoft.com/SQLServer/feedback/details/3120404/sql-modify-json-null-delete-is-not-consistent-between-properties-and-arrays)

SQL JSON Performance Tuning

SQL JSON functions are already fast. Adding computed columns and indexes makes them extremely fast.

Computed Column JSON Indexes

JSON indexes are simply regular indexes on computed columns.

Add a computed column:

Add an index to our computed column:

Performance test:

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Book Notes: The Art of Readable Code by Dustin Boswell

I highlight and take notes when I read nonfiction books. Once I finish a book, I format and edit my notes so that I can easily remind myself of what I learned without having to reread the book. These notes are not a substitute for reading the book, they only serve as a reminder of key concepts.


The fewer lines a variable is in scope, the shorter its name is allowed to be. Variables used in larger scopes must be less ambiguous so they are allowed to have longer names.

Don’t use project specific abbreviations — someone new to the project won’t know what they mean. “So our rule of thumb is: would a new teammate understand what the name means? If so, then it’s probably okay.”

“The clearest way to name a limit is to put max_ or min_ in front of the thing being limited.”

“In general, adding words like is, has, can, or should can make booleans more clear.”

“The best names are ones that can’t be misconstrued — the person reading your code will understand it the way you meant it, and no other way.”

“Everyone prefers to read code that’s aesthetically pleasing. By “formatting” your code in a consistent, meaningful way, you make it easier and faster to read.”

“The purpose of commenting is to help the reader know as much as the writer did.”

More comments aren’t necessarily better — the comments take up valuable screen real estate, so if there is a comment in the code “it better be worth it”.

“Don’t Comment Bad Names — Fix the Names Instead”

“Instead of minimizing the number of lines, a better metric is to minimize the time needed for someone to understand it.”

“The simplest way to break down an expression is to introduce an extra variable that captures a smaller subexpression. This extra variable is sometimes called an “explaining variable” because it helps explain what the subexpression means.”

“One technique [to clean up code] is to see if you can solve the problem the “opposite” way. Depending on the situation you’re in, this could mean iterating through arrays in reverse or filling in some data structure backward rather than forward.”

In general, keep scopes short and finish tasks as quickly as possible.

“The more places a variable is manipulated, the harder it is to reason about its current value.”

“The advice for this chapter is to aggressively identify and extract unrelated subproblems. Here’s what we mean: Look at a given function or block of code, and ask yourself, “What is the high-level goal of this code?” For each line of code, ask, “Is it working directly to that goal? Or is it solving an unrelated subproblem needed to meet it?” If enough lines are solving an unrelated subproblem, extract that code into a separate function.”

“Code that does multiple things at once is harder to understand. A single block of code might be initializing new objects, cleansing data, parsing inputs, and applying business logic, all at the same time.”

Break code up into small, easy to understand fragments. This makes understanding easier and reduces potential for bugs.

“do only one task at a time.”

“When explaining a complex idea to someone, it’s easy to confuse them with all the little details. It’s a valuable skill to be able to explain an idea “in plain English,” so that someone less knowledgeable than you can understand. It requires distilling an idea down to the most important concepts. Doing this not only helps the other person understand but also helps you think about your own ideas more clearly.”

“This chapter discussed the simple technique of describing your program in plain English and using that description to help you write more natural code. This technique is deceptively simple, but very powerful.”

“Knowing when not to code is possibly the most important skill a programmer can learn. Every line of code you write is a line that has to be tested and maintained. By reusing libraries or eliminating features, you can save time and keep your codebase lean and mean.”

“When you start a project, it’s natural to get excited and think of all the cool features you’ll want to implement. But programmers tend to overestimate how many features are truly essential to their project. A lot of features go unfinished or unused or just complicate the application.”

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

JSON Support Is The Best New Developer Feature in SQL 2016 — Part 4: Performance Comparisons

This is the fourth article in my series about learning how to use SQL Server 2016’s new JSON functions. If you haven’t already, you can read Part 1 — Parsing JSON, Part 2 — Creating JSON, and Part 3 — Updating, Adding, and Deleting JSON.


Additional performance comparisons available in an updated post.

We’ve finally come to my favorite part of analyzing any new software feature: performance testing. SQL Server 2016’s new JSON functions are great for parsing JSON data, creating JSON data, and modifying JSON data, but are they efficient?

Today we’ll examine three areas of SQL Server JSON performance:

  1. How to maximize performance for the new SQL Server JSON functions
  2. How the new SQL Server JSON functions compare against what was previously available in SQL Server
  3. How the new SQL JSON functions compare against Newtonsoft’s Json.NET

Maximizing SQL Server JSON Function Performance

I wanted to use a sizable data set in order to test the performance of the new JSON functions in SQL Server 2016. I found arthurkao’s car year/make/model data on GitHub and decided this ~20k element JSON array would be perfect for performance testing purposes. For my tests I’ll be using both the original JSON string as well as a SQL table that I created from the original JSON array:

Unlike XML in SQL Server (which is stored in it’s own datatype), JSON in SQL Server 2016 is stored as an NVARCHAR. This means instead of needing to use special indexes, we can use indexes that we are already familiar with.

To maximize performance, we can use Microsoft’s recommendation of adding a computed column for one of the JSON properties and then indexing that computed column:

Non-persisted computed columns (like in the example above) do not take up any additional space in the table. You can verify this for yourself by running sp_spaceused 'dbo.Cars' before and after adding the non-persisted column to the table.

Having a computed column doesn’t add any performance to our query on its own but it does now allow us to add an index to our parsed/computed JSON property.

Having the computed column doesn’t improve performance — we are still seeing a Table Scan

The clustered index that we add next stores pointers to each parsed/computed value causing the table not to take up any space and only causes the SQL engine to recompute the columns when the index needs to be rebuilt:

And the resulting execution plan now shows both queries (the one using JSON_VALUE() in the WHERE clause directly as well the one calling our computed column) using index seeks to find the data we are looking for:

Yay index seeks!

Overall, adding computed columns to our table adds no overhead in terms of storage space and allows us to then add indexes on JSON properties which improve performance significantly.

SQL Server 2016 JSON vs SQL Server pre-2016 JSON

As I’ve mentioned before, the best option for processing JSON data in SQL Server before 2016 was by using Phil Factor’s amazing JSON parsing function. Although the function works well, it is limited by what SQL Server functionality was available at the time and therefore wasn’t all that efficient.

The above query should work for getting the data we need. I’m abusing what the parseJSON function was probably built to do (I don’t think it was intended to parse ~20k element JSON arrays), and I’ll be honest I waited 10 minutes before killing the query. Basically, trying to parse this much data in SQL before 2016 just wasn’t possible (unless you wrote CLR).

Compared to the following queries which is using our indexed computed column SQL Server 2016 is able to return all of the results to us in 1 ms:

SQL Server 2016 JSON vs Newtonsoft’s Json.NET

In cases like the above where parsing JSON in SQL Server was never an option, my preferred method has always been to parse data in C#. In particular, Newtonsoft’s Json.NET is the standard for high performance JSON parsing, so let’s take a look at how SQL Server 2016 compares to that.

The following code shows 6 tests I ran in SQL Server 2016:

And then the same tests in a C# console app using Json.Net:

And the results compared side by side:

Essentially, it seems like Json.Net beats SQL Server 2016 on larger JSON manipulations, both are equal with small JSON objects, and SQL Server 2016 has the advantage at filtering JSON data when indexes are used.

Conclusion

SQL Server 2016 is excellent at working with JSON. Even though Json.NET beats SQL Server 2016 at working with large JSON objects (on the magnitude of milliseconds), SQL Server is equally fast on smaller objects and is advantageous when JSON data needs to be filtered or searched.

I look forward to using the SQL Server 2016 JSON functions more in the future, especially in instances where network I/O benefits me to process JSON on the SQL Server or when working with applications that cannot process JSON data, like SQL Server Reporting Services.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!