Every once in a while I hear of some technologist say that relational databases are dead; instead, a non-table based NoSQL storage format is the way of the future. SQL Server 2016 introduced JSON functionality, making it possible for some “non-SQL” data storage to make its way into the traditionally tabled-based SQL Server.
Does this mean all data in SQL Server going forward should be stored in long JSON strings? No, that would be a terrible idea. There are instances when storing JSON in SQL Server is a good choice though. In this post I want to create recommendations for when data should be stored as JSON and when it shouldn’t.
Databases Should Not Be Entirely Comprised Of JSON
The screenshot below is an example of what I think some developers would do if they were given free reign in SQL Server 2016:
Here we have an application database (“InventoryApp”) that consists of only a single table (“dbo.Data”) with three JSON NVARCHAR(MAX) columns to represent all of the data required by the app. Relationships exist between Sales, Purchases, and Customers but these are not defined on the database side.
If you are from the world of relational-SQL, you might not believe that anyone would design such a database structure. Believe me though, this is a realistic scenario. Entire companies (eg. Firebase: https://firebase.google.com/) build their services around abstracting the database layer away from developers, essentially storing entire tables or databases in large JSON strings.
Many developers like storing data this way because it is easy to deserialize JSON strings into objects in their programming languages to use in their apps. They like the fact that with JSON they can have an infinitely changing storage schema (just add new keys, values, and arrays!) so if they need a new field for their app, they can just add it in, serialize the object to a JSON string, and store it again in the database.
Obviously, going completely “NoSQL” might make short term development easier/quicker, but using SQL Server 2016 to only store data this way is a travesty: there’s no way to use many of SQL Server’s amazing performance, schema definition and validation, and security features.
So when is it appropriate to store JSON in SQL Server?
Appropriate Use Case #1: Error Logging
Errors happen. When they do, it’s nice to be able to go back and look at the error message to see what happened.
The problem is that the structure of error messages isn’t always consistent. Sometimes only the value of a single property will help identify the cause of failure. Other times, something more complex fails and it would be nice to have all of the values of a complex object available to make troubleshooting easier.
This is where JSON steps in: in most programming languages, it is easy to convert error messages and run time values to a JSON object on error. And since error messages and data values change in structure depending on where they occur, it’s easy to dynamically turn any type of object into JSON data.
This data is perfect to store in SQL to be looked at later. None of these ideas are new — nvarchar(max) has been in SQL for a while now, and so programmers everywhere have been storing error information in that datatype.
Most large workplaces have controls in place that prevent developers from making changes in production. In general this is a Good Idea™.
However, controls are sometimes too restrictive. For example, due to security restrictions, lack of server space, company politics, etc… developers are sometimes stuck developing in production. It’s an unfortunate fact of life. In those scenarios, developers have to go through hell if they have to elevate each database structure change every time they want to test something in production.
JSON to the rescue! An nvarchar(max) column in a table can have its JSON data be easily added to and modified to fit more data than it was originally intended to hold. All without any database structure change requests.
Now this is not an ideal situation. In fact, it’s a scenario that can add a lot of technical debt to the application long-term if not planned for.
However, if a “flexible” JSON column is built with eventual conversion to a traditional table structure in mind from the start, it’s actually simple for a developer to transition an entirely JSON storage structure to a relational format later on. They key here is that the developer needs to have this conversion planned from day one.
Appropriate Use Case #3: Non-Analytical Data
Analytical data is SQL Server’s bread and butter. Need to store lots of data and be able to query against it all day long? No problem, there are a plethora of performance tuning options to make your queries run fast and efficiently.
However, sometimes not all data needs to be analyzed. Often an app might need to save some session data to a database temporarily — why bother creating all of the maintenance overhead of strict database schemas if the data will never be queried for analytical purposes? Another example might be a website’s dynamically created user profile settings. You can build normalized table(s) to store all of that data, but then you will be writing programming logic to normalize and denormalize your data out of the app.
If this data will not have to be searched, then why bother adding all of the overhead? Keep it in JSON and be done with.
SELECT *FROM OPENJSON(@garage,'$.Cars')--Displaying the values of our"Cars"array.We additionally get the order of the JSON objects outputted inthe"key"column andthe JSON objectdatatype inthe"type"column
-- To truly delete it (and not have the NULL appear as the first item in the array) we have to convert to a rowset, select everything that'snotthe first row,aggregate the rows intoastring(UGH)andthenrecreate asJSON.
--Thisisincredibly ugly.The STREAM_AGG()functioninSQL vNext should make italittle cleaner,but why doesn't the JSON_MODIFY NULL syntax just get rid of the element in the array?
-- I have opened a Microsoft connect issue for this here: https://connect.microsoft.com/SQLServer/feedback/details/3120404
I highlight and take notes when I read nonfiction books. Once I finish a book, I format and edit my notes so that I can easily remind myself of what I learned without having to reread the book. These notes are not a substitute for reading the book, they only serve as a reminder of key concepts.
The fewer lines a variable is in scope, the shorter its name is allowed to be. Variables used in larger scopes must be less ambiguous so they are allowed to have longer names.
Don’t use project specific abbreviations — someone new to the project won’t know what they mean. “So our rule of thumb is: would a new teammate understand what the name means? If so, then it’s probably okay.”
“The clearest way to name a limit is to put max_ or min_ in front of the thing being limited.”
“In general, adding words like is, has, can, or should can make booleans more clear.”
“The best names are ones that can’t be misconstrued — the person reading your code will understand it the way you meant it, and no other way.”
“Everyone prefers to read code that’s aesthetically pleasing. By “formatting” your code in a consistent, meaningful way, you make it easier and faster to read.”
“The purpose of commenting is to help the reader know as much as the writer did.”
More comments aren’t necessarily better — the comments take up valuable screen real estate, so if there is a comment in the code “it better be worth it”.
“Don’t Comment Bad Names — Fix the Names Instead”
“Instead of minimizing the number of lines, a better metric is to minimize the time needed for someone to understand it.”
“The simplest way to break down an expression is to introduce an extra variable that captures a smaller subexpression. This extra variable is sometimes called an “explaining variable” because it helps explain what the subexpression means.”
“One technique [to clean up code] is to see if you can solve the problem the “opposite” way. Depending on the situation you’re in, this could mean iterating through arrays in reverse or filling in some data structure backward rather than forward.”
In general, keep scopes short and finish tasks as quickly as possible.
“The more places a variable is manipulated, the harder it is to reason about its current value.”
“The advice for this chapter is to aggressively identify and extract unrelated subproblems. Here’s what we mean: Look at a given function or block of code, and ask yourself, “What is the high-level goal of this code?” For each line of code, ask, “Is it working directly to that goal? Or is it solving an unrelated subproblem needed to meet it?” If enough lines are solving an unrelated subproblem, extract that code into a separate function.”
“Code that does multiple things at once is harder to understand. A single block of code might be initializing new objects, cleansing data, parsing inputs, and applying business logic, all at the same time.”
Break code up into small, easy to understand fragments. This makes understanding easier and reduces potential for bugs.
“do only one task at a time.”
“When explaining a complex idea to someone, it’s easy to confuse them with all the little details. It’s a valuable skill to be able to explain an idea “in plain English,” so that someone less knowledgeable than you can understand. It requires distilling an idea down to the most important concepts. Doing this not only helps the other person understand but also helps you think about your own ideas more clearly.”
“This chapter discussed the simple technique of describing your program in plain English and using that description to help you write more natural code. This technique is deceptively simple, but very powerful.”
“Knowing when not to code is possibly the most important skill a programmer can learn. Every line of code you write is a line that has to be tested and maintained. By reusing libraries or eliminating features, you can save time and keep your codebase lean and mean.”
“When you start a project, it’s natural to get excited and think of all the cool features you’ll want to implement. But programmers tend to overestimate how many features are truly essential to their project. A lot of features go unfinished or unused or just complicate the application.”