5 Business Problems You Can Solve With Temporal Tables

3404b-14frjyurrbemzjlzyfhtz7w

Watch this week's video on YouTube

It's 4:30 pm on Friday and Mr. Manager comes along to tell you that he needs you to run some important ad-hoc analysis for him.

Previously this meant having to stay late at the office, writing cumbersome queries to extract business information from transactional data.

Lucky for you, you've recently started using Temporal Tables in SQL Server ensuring that you'll be able to answer your boss's questions and still make it to happy hour for \$3 margaritas.

Sound like a plan? Keep reading below!

The Data

For these demos, we'll be using my imaginary car rental business data. It consists of our temporal table dbo.CarInventory and our history table dbo.CarInventoryHistory:

I've upgraded my business — we now have FOUR Chevy Malibus available for you to rent

Business Problem #1 — "Get me current inventory!"

To get our current inventory of rental cars, all we have to do is query the temporal table:

SELECT * FROM dbo.CarInventory

That's it.

I know this query seems lame — it's just a SELECT FROM statement. There are no FOR SYSTEM TIME clauses, WHERE statements, and no other interesting T-SQL features.

But that's the point! Have you ever had to get the "current" rows out of a table that is keeping track of all transactions? I'm sure it involved some GROUP BY statements, some window functions, and more than a few cups of coffee.

Temporal tables automatically manage your transaction history, providing the most current records in one table (dbo.CarInventory) and all of the historical transactions in another (dbo.CarInventoryHistory). No need for complicated queries.

Business Problem #2 — "How many miles on average do our customers drive each of our cars?"

In this example, we use FOR SYSTEM_TIME ALL and a plain old GROUP BY to get the data we need:

  SELECT
    CarId, AVG(Mileage) AS AverageMileage
  FROM
    dbo.CarInventory FOR SYSTEM_TIME ALL
  WHERE
    InLot = 1 -- The car has been successfully returned to our lot
    AND SysStartTime > '2017-05-13 08:00:00.0000000' -- Ignore our initial car purchase
  GROUP BY
    CarId

Some cars get driven a lot more. Causation or correlation?

FOR SYSTEM_TIME ALL returns all rows from both the temporal and history table. It's equivalent to:

SELECT * FROM dbo.CarInventory 
UNION ALL 
SELECT * FROM dbo.CarInventoryHistory

Once again, there isn't anything too fancy going on here — but that's the point. With temporal tables, your data is organized to make analysis easier.

Business Problem #3 — "How many cars do we rent out week over week?"

Here at Wagner Car Rentals we want to figure out how often our cars are being rented and see how those numbers change from week to week.

SELECT
  CurrentWeek.CarId,
  CurrentWeek.RentalCount AS CurrentRentalCount,
  PreviousWeek.RentalCount AS PreviousRentalCount
FROM
  (
  SELECT
    CarId,
    COUNT(*) AS RentalCount
  FROM
    dbo.CarInventory FOR SYSTEM_TIME FROM '2017-06-05' TO '2017-06-12'
  WHERE
    InLot = 0 -- Car is out with the customer
  GROUP BY
    CarId
  ) CurrentWeek
  FULL JOIN
  (
  SELECT
    CarId,
    COUNT(*) AS RentalCount
  FROM
    dbo.CarInventory FOR SYSTEM_TIME FROM '2017-05-29' TO '2017-06-05'
  WHERE
    InLot = 0 -- Car is out with the customer
  GROUP BY
    CarId
  ) PreviousWeek
  ON CurrentWeek.CarId = PreviousWeek.CarId

0771c-1jwllidj567utcv3bgdsjow

In this query, we are using FOR SYSTEM_TIME FOR/TO on our temporal table to specify what data we want in our "CurrentWeek" and "PreviousWeek" subqueries.

FOR/TOreturns any records that were active during the specified range(BETWEEN/AND does the same thing, but its upper bound datetime2 value is inclusive instead of exclusive).

Business Problem #4 — "What color cars are rented most often?"

We're thinking of expanding our fleet of rental vehicles and want to purchase cars in the most popular colors so we can keep customers happy (and get more of their business!). How can we tell which color cars get rented most often?

SELECT 
  CarId, 
  Color, 
  COUNT(*)/2 AS RentalCount -- Divide by 2 because transactions are double counted (rental and return dates)
FROM 
  dbo.CarInventory FOR SYSTEM_TIME CONTAINED IN ('2017-05-15','2017-06-15')
GROUP BY 
  CarId,
  Color

Here we use CONTAINED IN because we want to get precise counts of how many cars were rented and returned in a specific date range (if a car wasn't returned — stolen, wrecked and totaled, etc… — we don't want to purchase more of those colors in the future).

67c8c-1xtqczezvl0y_s6hxzhpi0g

Business Problem #5 — "Jerry broke it. FIX IT!"

The computer systems that we use at Wagner Car Rentals are a little…dated.

Instead of scanning a bar code to return a car back into our system, our employees need to manually type in the car details. The problem here is that some employees (like Jerry) can't type, and often makes typos:

SELECT * FROM dbo.CarInventory FOR SYSTEM_TIME ALL WHERE CarId = 4

c0dd3-1nxqy43njb7r63theaq0kja

Having inconsistent data makes our reporting much more difficult, but fortunately since we have our temporal table tracking row-level history, we can easily correct Jerry's typos by pulling the correct values from a previous record:

;WITH InventoryHistory  
AS  
(   
  SELECT ROW_NUMBER () OVER (PARTITION BY CarId ORDER BY SysStartTime DESC) AS RN, *  
  FROM dbo.CarInventory FOR SYSTEM_TIME ALL WHERE CarId = 4  
)  
--SELECT * FROM InventoryHistory
/*Update current row by using N-th row version from history (default is 1 - i.e. last version)*/  
UPDATE dbo.CarInventory   
  SET Color = h.Color  
  FROM 
    dbo.CarInventory i 
    INNER JOIN InventoryHistory h 
        ON i.CarId = h.CarId 
        AND RN = 2

Typos fixed!

Although we could have fixed this issue without using a temporal table, it shows how having all of the row-level transaction history makes it possible to repair incorrect data in more difficult scenarios. For even hairier situations, you can even roll-back your temporal table data.

Conclusion

Temporal tables are easy to setup and make writing analytical queries a cinch.

Hopefully writing queries against temporal tables will prevent you from having to stay late in the office the next time your manager asks you to run some ad-hoc analysis.

How To Roll-Back Data in a Temporal Table

"The Big Red Button" by włodi used under CC BY-SA 2.0 / Cropped and text added from original

Watch this week's video on YouTube

So you've started using temporal tables because they make your point-in-time analysis queries super easy.

Your manager is happy because you're getting historical data to him quickly. Your DBA is happy because she doesn't have to clean up any performance killing triggers that replicate a temporal table's functionality. Everything with temporal tables has made your life better.

Except that time when you accidentally inserted some bad data into your temporal table.

Whoops

The good news is that all of your data is still intact — it's been copied over to the historical table. Phew!

Now all you need to do is rollback this inadvertent row insertion and make your tables look just like you did before you started breaking them.

This should be easy right?

Well not exactly — there's no automatic way to roll back the data in a temporal table. However, that doesn't mean we can't write some clever queries to accomplish the same thing.

Let's make some data

Don't mind the details of this next query too much. It uses some non-standard techniques to fake the data into a temporal/historical table with "realistic" timestamps:

IF OBJECT_ID('dbo.CarInventory', 'U') IS NOT NULL 
BEGIN
    -- When deleting a temporal table, we need to first turn versioning off
    ALTER TABLE dbo.CarInventory SET ( SYSTEM_VERSIONING = OFF  ) 
    DROP TABLE dbo.CarInventory
    DROP TABLE dbo.CarInventoryHistory
END;
CREATE TABLE CarInventory   
(    
    CarId INT IDENTITY PRIMARY KEY NOT NULL,
    Year INT,
    Make VARCHAR(40),
    Model VARCHAR(40),
    Color varchar(10),
    Mileage INT,
    InLot BIT NOT NULL DEFAULT 1   
);
CREATE TABLE CarInventoryHistory  
(    
    CarId INT NOT NULL,
    Year INT,
    Make VARCHAR(40),
    Model VARCHAR(40),
    Color varchar(10),
    Mileage INT,
    InLot BIT NOT NULL,
    SysStartTime datetime2 NOT NULL, 
    SysEndTime datetime2   NOT NULL 
); 

INSERT INTO dbo.CarInventoryHistory (CarId,Year,Make,Model,Color,Mileage,InLot,SysStartTime,SysEndTime) VALUES(1,2017,'Chevy','Malibu','Black',0,1,'2017-05-13 8:00:00.0000000','2017-05-14 8:00:00.0000000');
INSERT INTO dbo.CarInventoryHistory (CarId,Year,Make,Model,Color,Mileage,InLot,SysStartTime,SysEndTime) VALUES(2,2017,'Chevy','Malibu','Silver',0,1,'2017-05-13 8:00:00.0000000','2017-05-14 9:00:00.0000000');
INSERT INTO dbo.CarInventoryHistory (CarId,Year,Make,Model,Color,Mileage,InLot,SysStartTime,SysEndTime) VALUES(1,2017,'Chevy','Malibu','Black',0,0,'2017-05-14 8:00:00.0000000','2017-05-15 7:00:00.0000000');
INSERT INTO dbo.CarInventoryHistory (CarId,Year,Make,Model,Color,Mileage,InLot,SysStartTime,SysEndTime) VALUES(2,2017,'Chevy','Malibu','Silver',0,0,'2017-05-14 9:00:00.0000000','2017-05-19 15:00:00.0000000');
INSERT INTO dbo.CarInventoryHistory (CarId,Year,Make,Model,Color,Mileage,InLot,SysStartTime,SysEndTime) VALUES(1,2017,'Chevy','Malibu','Black',73,1,'2017-05-15 7:00:00.0000000','2017-05-16 10:00:00.0000000');
INSERT INTO dbo.CarInventoryHistory (CarId,Year,Make,Model,Color,Mileage,InLot,SysStartTime,SysEndTime) VALUES(2,2017,'Chevy','Malibu','Silver',488,1,'2017-05-19 15:00:00.0000000','2017-05-20 08:00:00.0000000');

ALTER TABLE dbo.CarInventory
ADD SysStartTime DATETIME2 GENERATED ALWAYS AS ROW START NOT NULL
    CONSTRAINT DF_SysStart DEFAULT SYSUTCDATETIME(),
SysEndTime DATETIME2 GENERATED ALWAYS AS ROW END NOT NULL
    CONSTRAINT DF_SysEnd DEFAULT '9999-12-31 23:59:59.9999999',
PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime);

SET IDENTITY_INSERT dbo.CarInventory ON;
INSERT INTO dbo.CarInventory (CarId,Year,Make,Model,Color,Mileage,InLot) VALUES(1,2017,'Chevy','Malibu','Black',120,1);
INSERT INTO dbo.CarInventory (CarId,Year,Make,Model,Color,Mileage,InLot) VALUES(2,2017,'Chevy','Malibu','Silver',591,1);
SET IDENTITY_INSERT dbo.CarInventory OFF;

-- We need to make sure that the last SysEndTimes in our historical table match the SysStartTimes in our temporal table
DECLARE @LastSysStartTimeInTemporalCar1 DATETIME2, @LastSysStartTimeInTemporalCar2 DATETIME2
SELECT @LastSysStartTimeInTemporalCar1 = SysStartTime FROM dbo.CarInventory WHERE CarId = 1
SELECT @LastSysStartTimeInTemporalCar2 = SysStartTime FROM dbo.CarInventory WHERE CarId = 2
INSERT INTO dbo.CarInventoryHistory (CarId,Year,Make,Model,Color,Mileage,InLot,SysStartTime,SysEndTime) VALUES(1,2017,'Chevy','Malibu','Black',73,0,'2017-05-16 10:00:00.0000000',@LastSysStartTimeInTemporalCar1);
INSERT INTO dbo.CarInventoryHistory (CarId,Year,Make,Model,Color,Mileage,InLot,SysStartTime,SysEndTime) VALUES(2,2017,'Chevy','Malibu','Silver',488,0,'2017-05-20 08:00:00.0000000',@LastSysStartTimeInTemporalCar2);

ALTER TABLE dbo.CarInventory SET ( SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.CarInventoryHistory));

-- If everything worked well, we should see our data correctly in these table
SELECT * FROM dbo.CarInventory
SELECT * FROM dbo.CarInventoryHistory

If you look at the results of our temporal table (top) and historical table (bottom), they should look something like this:

This data represents my totally real, very very not-fake rental car business.

You see those two rows in the top temporal table? Those are the ones I just added accidentally. I actually had a bug in my code *ahem* and all of the data inserted after 2017–05–18 is erroneous.

The bug has been fixed, but we want to clean up the incorrect entries and roll back the data in our temporal tables to how it looked on 2017–05–18. Basically, we want the following two rows to appear in our "current" temporal table and the historical table to be cleaned up of any rows inserted after 2017–05–18:

ec914-15dfitymhzrrzuxxwf6fq_w

Fortunately, we can query our temporal table using FOR SYSTEM_TIME AS OF to get the two rows highlighted above pretty easily. Let's do that and insert into a temp table called ##Rollback:

DROP TABLE IF EXISTS ##Rollback
SELECT
    *
INTO ##Rollback
FROM 
    dbo.CarInventory
FOR SYSTEM_TIME AS OF '2017-05-18'

-- Update the SysEndTime to the max value because that's what it's always set to in the temporal table
UPDATE ##Rollback SET SysEndTime = '9999-12-31 23:59:59.9999999'

You'll notice we also updated the SysEndTime — that's because a temporal table always has its AS ROW END column set to the max datetime value.

Looking at ##Rollback, we have the data we want to insert into our temporal table:

This is the data we want!

Now, it'd be nice if we could just insert the data from #Rollback straight into our temporal table, but that would get tracked by the temporal table!

So instead, we need to turn off system versioning, allow identity inserts, delete our existing data, and insert from ##Rollback. Basically:

ALTER TABLE dbo.CarInventory SET ( SYSTEM_VERSIONING = OFF);

SET IDENTITY_INSERT dbo.CarInventory ON;

DELETE FROM dbo.CarInventory WHERE CarId IN (SELECT DISTINCT CarId FROM ##Rollback)

INSERT INTO dbo.CarInventory (CarId,Year,Make,Model,Mileage,Color,InLot)
SELECT CarId,Year,Make,Model,Mileage,Color,InLot
FROM ##Rollback

While system versioning is off, we can also clean up the historical table by deleting all records after 2017–05–18 by joining the ##Rollback temp table on SysStartTime:

DELETE h
FROM ##Rollback t
    INNER JOIN dbo.CarInventoryHistory h 
    ON
    h.CarId = t.CarId
    AND t.SysStartTime <= h.SysStartTime

We have rolled back our data successfully!

482a5-16k6xjuxuri0agyghgjoiza

Only One Tiny Problem

Did you notice that the last SysEndTime values in our historical table don't match up with the SysStartTime values in our temporal table?

1fd81-1jlehwp11deydac7hrc5npg

This is a data integrity issue for our temporal table — our datetimes should always be continuous.

Fortunately this is easily fixed with one more UPDATE statement:

UPDATE t
SET t.SysEndTime = i.SysStartTime
FROM dbo.CarInventoryHistory t
    INNER JOIN ##Rollback r 
    ON t.CarId = r.CarId
    AND t.SysEndTime = r.SysStartTime
    INNER JOIN dbo.CarInventory i
    ON t.CarId = i.CarId
    AND r.CarId = i.CarId

SELECT * FROM dbo.CarInventory
SELECT * FROM dbo.CarInventoryHistory

Our correctly rolled back temporal table

Finally, remember to turn system versioning back on and to turn off our identity inserts to restore the original functionality of our temporal tables:

ALTER TABLE dbo.CarInventory SET ( SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.CarInventoryHistory));
SET IDENTITY_INSERT dbo.CarInventory OFF;

Congratulations, you've rolled back your temporal table data!

How To Use Temporal Tables For Easy Point-In-Time Analysis

Watch this week's video on YouTube

Have you ever needed to look at what data in a table used to look like?

If you have, it probably took a knuckle-cracking filled session of writing group-by statements, nested sub-queries, and window functions to write your time-travelling query.

Sorry for your lost day of productivity — I've been there too.

Fortunately for us, SQL Server 2016 introduces a new feature to make our point-in-time analysis queries easy to write: temporal tables.

Temporal Tables? Are Those The Same As Temporary Tables?

Don't let the similar sounding name fool you: "temporal" <> "temporary".

Temporal tables consist of two parts:

  1. The temporal table — this is the table that contains the current values of your data.
  2. The historical table — this table holds all of the previous values that at some point existed in your temporal table.

You might have created a similar setup yourself in previous versions of SQL using triggers. However, using a temporal table is different from this because:

  1. You don't need to write any triggers/stored procedures! All of the history tracking is done automatically by SQL Server.
  2. Retrieving the data uses a simple WHERE clause — no complex querying required.

I want to make my life easier by using temporal tables! Take my money and show me how!

I'm flattered by your offer, but since we are good friends I'll let you in on these secrets for free.

First let's create a temporal table. I'm thinking about starting up a car rental business, so let's model it after that:

IF OBJECT_ID('dbo.CarInventory', 'U') IS NOT NULL 
BEGIN
    -- When deleting a temporal table, we need to first turn versioning off
    ALTER TABLE [dbo].[CarInventory] SET ( SYSTEM_VERSIONING = OFF  ) 
    DROP TABLE dbo.CarInventory
    DROP TABLE dbo.CarInventoryHistory
END
CREATE TABLE CarInventory   
(    
    CarId INT IDENTITY PRIMARY KEY,
    Year INT,
    Make VARCHAR(40),
    Model VARCHAR(40),
    Color varchar(10),
    Mileage INT,
    InLot BIT NOT NULL DEFAULT 1,
    SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL,
    SysEndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL,
    PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime)     
)   
WITH 
( 
    SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.CarInventoryHistory)   
)

The key things to note with our new table above are that

  1. it contains a PRIMARY KEY.
  2. it contains two datetime2 fields, marked with GENERATED ALWAYS AS ROW START/END.
  3. It contains the PERIOD FOR SYSTEM_TIME statement.
  4. It contains the SYSTEM_VERSIONING = ON property with the (optional) historical table name (dbo.CarInventoryHistory).

If we query our newly created tables, you'll notice our column layouts are identical:

SELECT * FROM dbo.CarInventory
SELECT * FROM dbo.CarInventoryHistory

Let's fill it with the choice car of car rental agencies all across the U.S. — the Chevy Malibu:

INSERT INTO dbo.CarInventory (Year,Make,Model,Color,Mileage) VALUES(2017,'Chevy','Malibu','Black',0)
INSERT INTO dbo.CarInventory (Year,Make,Model,Color,Mileage) VALUES(2017,'Chevy','Malibu','Silver',0)

In all of the remaining screen shots, the top result is our temporal table dbo.CarInventory and the bottom result is our historical table dbo.CarInventoryHistory.

You'll notice that since we've only inserted one row for each our cars, there's no row history yet and therefore our historical table is empty.

Let's change that by getting some customers and renting out our cars!

UPDATE dbo.CarInventory SET InLot = 0 WHERE CarId = 1
UPDATE dbo.CarInventory SET InLot = 0 WHERE CarId = 2

Now we see our temporal table at work: we updated the rows in dbo.CarInventory and our historical table was automatically updated with our original values as well as timestamps for how long those rows existed in our table.

After a while, our customers return their rental cars:

UPDATE dbo.CarInventory SET InLot = 1, Mileage = 73  WHERE CarId = 1
UPDATE dbo.CarInventory SET InLot = 1, Mileage = 488 WHERE CarId = 2

Our temporal table show the current state of our rental cars: the customers have returned the cars back to our lot and each car has accumulated some mileage.

Our historical table meanwhile got a copy of the rows from our temporal table right before our last UPDATE statement. It's automatically keeping track of all of this history for us!

Continuing on, business is going well at the car rental agency. We get another customer to rent our silver Malibu:

UPDATE dbo.CarInventory SET InLot = 0 WHERE CarId = 2

Unfortunately, our second customer gets into a crash and destroys our car:

DELETE FROM dbo.CarInventory WHERE CarId = 2

With the deletion of our silver Malibu, our test data is complete.

Now that we have all of this great historically tracked data, how can we query it?

If we want to reminisce about better times when both cars were damage free and we were making money, we can write a query using SYSTEM_TIME AS OF to show us what our table looked like at that point in the past:

SELECT
    *
FROM 
    dbo.CarInventory
FOR SYSTEM_TIME AS OF '2017-05-18 23:49:50'

And if we want to do some more detailed analysis, like what rows have been deleted, we can query both temporal and historical tables normally as well:

-- Find the CarIds of cars that have been wrecked and deleted
SELECT DISTINCT
    h.CarId AS DeletedCarId
FROM
    dbo.CarInventory t
    RIGHT JOIN dbo.CarInventoryHistory h
    ON t.CarId = h.CarId 
WHERE 
    t.CarId IS NULL

C̶o̶l̶l̶i̶s̶i̶o̶n̶ Conclusion

Even with my car rental business not working out, at least we were able to see how SQL Server's temporal tables helped us keep track of our car inventory data.

I hope you got as excited as I did the first time I saw temporal tables in action, especially when it comes to querying with FOR SYSTEM_TIME AS OF. Long gone are the days of needing complicated queries to rebuild data for a certain point in time.

How to fix your terribly slow SQL job

This is a real-time progress bar for some of my old jobs. They are still running and stuck at 3%.

How many times have you written a program, ETL, analysis job, etc… that seemed like it would never finish running?

Although poor performance can be caused in a multitude of ways, the easiest to fix is by reducing your data in SQL Server instead of your in your programming/ETL/analysis layer (Excel, R, SAS, Python, ..NET, etc…).

SQL is built to handle and process data extremely efficiently. You will usually experience much better performance the more work (data merging, transformations, etc…) you can do to your data on the SQL server. I say "usually" because SQL won't always be faster than a programming language at transforming data, but 9 times out of 10 you can get faster results straight on the SQL Server.

Watch this week's video on YouTube

Let's look at one of my crappy processes

How many of us have ever written a process that does something like this:

1. Write the most basic query possible, something like SELECT * FROM dbo.User

2. Take the output of the above query, load it into Excel/SAS/Python/.NET/etc…

3. Write some code to filter the dataset

4. Write some code to summarize the data, transform columns, etc…

5. Write another SELECT * FROM dbo.Sale against the SQL Server to bring in more data

6. Bring it into Excel/SAS/Python/.NET/etc… and merge it with our original data

7. Repeat steps 3–6 as many times as needed

Some of my earliest PHP and MySQL websites worked exactly like this 😳! The code was slow on my server and users ended up suffering with slow webpage load times.

If the above process even slightly resembles something you've written before, continue reading on…

Why bother learning to transform data in SQL? I already know how to do that stuff in .

Old habits are hard to break, but you do want to make your processes run faster, right? This stuff is all easy, I promise!

Basically, if you are running code similar to above, the reason your job is slow is because you are not optimizing where your work is being performed:

  • Every time you write SELECT * you probably are bringing back more data than you actually need — you are hurting your performance.
  • Every time you don't have a WHERE clause, you are hurting your performance.
  • Every time your process queries the database multiple times (ie. multiple SELECT statements in your job to bring back data), you are hurting your performance.

In case you missed it, not taking the time to filter and reduce your data down as much as possible in your SQL is hurting your performance! Assuming your SQL Server and your programming layer are on different machines, you lose lots of time transferring unnecessary data over the wires (or air) as well as not efficiently using all of the advantages that your SQL server offers.

What's the solution to this inefficient processing?

Process your data on the SQL Server!

If you are not filtering, joining, and transforming your data until your programming layer, you are likely losing valuable SQL performance power and network efficiency. Here are some easy ways to reduce the size of your dataset on the SQL Server to improve performance in your jobs (and make your coworkers envious of your skills)!

SELECT [ColumnName]

If you are using SELECT *, stop!

SELECT * brings back all of the columns on your table, including the ones you don't need. This increases the amount of data sent over the network (which doesn't even get used) as well as increases the amount of data that needs to be read from disk (and storage hardware is usually relatively slow). Not to mention if your table is using indexes, SELECT * most likely causes some of those indexes not to be used as efficiently (or at all) which causes your queries to slow down even further.

But what if you do need all of the columns on a particular table? You still shouldn't use SELECT *! Although there's no performance difference, using SELECT * just means you are taking on technical debt. In the future, when a column gets added or removed from your table, your downstream processes may break because they are now automatically receiving (or no longer receiving) that column. Do you want to have to fix a failing process in the future because its now receiving more data that it was expecting? I don't think so!

JOINs

My inefficient process example above starts with selecting some data and bringing it into my programming environment. The process then runs another query to bring in additional data and joins it to the data from my first query in my programming environment.

This is terrible!

First off, we are breaking the first principle we learned in the SELECT * section above — we are bringing back more data than we need! If we are using INNER JOIN on our two datasets, we most likely are going to be filtering out some data — data we don't need. Joining on the SQL server first will reduce our total dataset size and make our network and disk performance more efficient.

Even if we are doing something like a LEFT or FULL OUTER join where we will be keeping all of the data from one or both of our datasets, it still benefits us to perform this join on the SQL Server. Why you ask? Because the people who built SQL Server have spent hundreds or thousands of hours performance tuning and debugging their joining algorithms. The chances that you will be able to write a more efficient join algorithm is highly unlikely.

And even if you are a programming savant, why reinvent the wheel? Unless your app needs every last microsecond of performance, just use SQL Server for what it's really good at: relational data joining.

WHERE Clauses

Let's say our dbo.User table has 50 thousand rows and our dbo.Sale table has 1 million rows. If your process is only looking for active users and sales from the past month, let's say 2 thousand rows and 22,000 rows respectively, then you are causing SQL to lookup and transfer 95% more rows than your process needs. Not only does it kill network performance, but your program layer then needs to filter out this data, doing extra work that it probably can't do as efficiently as SQL Server.

If instead I would have just added predicates to the SQL WHERE clause like Active=1 and SalesDate >= DATEADD(month, -1, GETDATE()) we would have saved both time and bandwidth.

Aggregate Functions

You know what's better than sending 10,000 rows of data over the network and then summing them up in your programming layer?

Using SQL's SUM() aggregate function to reduce those 10,000 rows to just 1 row before sending it across the network.

SQL aggregate functions take many rows of data and consolidate them down into fewer rows.

SQL's aggregate functions are also flexible enough to use the OVER() clause, allowing for windowed sets within your data — basically allowing you to be even more flexible with how you aggregate your data.

Don't wait until your application layer to summarize parts of your data — do it in your SQL query instead.

Scalar Functions

Although aggregate functions do some serious heavy lifting, scalar functions that run on each row of data aren't anything to laugh at either. Although they won't reduce the number of rows in your output, they can certainly reduce the number of columns you are outputting.

For example, say you have multiple columns of data in your dataset that ultimately need to be combined into a single output column. It's much better to use ISNULL(), COALESCE(), or CASE to combine multiple columns into a single column with logic in your SQL query so less data needs to be transferred later.

Once again, reducing the amount of data you are sending over the network is key to getting faster run times.

XML and JSON Functions

Last but not least, if your process is generating XML or JSON data at some point, consider generating that data on the SQL Server. Now, generating XML and JSON data won't always improve your performance — SQL Server is best at relational tasks and not large string creation — but in many cases, especially with JSON, SQL Server can outperform even the fastest .NET libraries.

If your network is your bottle neck, then it is very possible that SQL can apply complex logic and transform your data into XML or JSON faster on the SQL Server than if you needed to transfer all of that data to another location on the network and handle those transformations in another programming language.

In short: do as much work as possible in SQL

If your SQL queries could be following any of the above techniques and they're not, then fix them…today! Checking each of your queries for any of the above inefficiencies and mitigating them will probably (always test your changes) improve the performance of your applications and processes.

And then it won't feel like your process is taking forever to run.

The Ultimate SQL Server JSON Cheat Sheet

f23b7-1bpskynllsyfmagyrr-svwa

This post is a reference of my examples for processing JSON data in SQL Server. For more detailed explanations of these functions, please see my post series on JSON in SQL Server 2016:

Additionally, the complete reference for SQL JSON handling can be found at MSDN: https://msdn.microsoft.com/en-us/library/dn921897.aspx


Parsing JSON

Getting string JSON data into a SQL readable form.

ISJSON()

Checks to see if the input string is valid JSON.

SELECT ISJSON('{ "Color" : "Blue" }') -- Returns 1, valid
-- Output: 1

SELECT ISJSON('{ "Color" : Blue }') -- Returns 0, invalid, missing quotes
-- Output: 0

SELECT ISJSON('{ "Number" : 1 }') -- Returns 1, valid, numbers are allowed
-- Output: 1

SELECT ISJSON('{ "PurchaseDate" : "2015-08-18T00:00:00.000Z" }') -- Returns 1, valid, dates are just strings in ISO 8601 date format https://en.wikipedia.org/wiki/ISO_8601
-- Output: 1

SELECT ISJSON('{ "PurchaseDate" : 2015-08-18 }') -- Returns 0, invalid
-- Output: 0

JSON_VALUE()

Extracts a specific scalar string value from a JSON string using JSON path expressions.

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

SELECT JSON_VALUE(@garage, '$.Cars[0].Make') -- Return the make of the first car in our array
-- Output: Volkswagen

SELECT CAST(JSON_VALUE(@garage, '$.Cars[0].PurchaseDate') as datetime2) -- Return the Purchase Date of the first car in our array and convert it into a DateTime2 datatype
-- Output: 2006-10-05 00:00:00.0000000

SELECT JSON_VALUE(@garage, '$.Cars') -- This returns NULL because the values of Cars is an array instead of a simple object
-- Output: NULL

SELECT JSON_VALUE(@garage, '$.Cars[1].Model') -- This is also invalid because JSON_VALUE cannot return an array...only scalar values allowed!
-- Output: NULL

SELECT JSON_VALUE(@garage, '$.Cars[1].Model.Base') -- Much better
-- Output: Impreza

Strict vs. Lax mode

If the JSON path cannot be found, determines if the function should return a NULL or an error message.

-- Lax (default: function will return an error if invalid JSON path specified
SELECT JSON_VALUE('{ "Color" : "Red" }', '$.Shape') --lax is the default, so you don't need to be explicitly state it
-- Output: NULL

SELECT JSON_VALUE('{ "Color" : "Red" }', 'lax $.Shape')
-- Output: NULL

-- Strict: function will return an error if invalid JSON path specified
SELECT JSON_VALUE('{ "Color" : "Red" }', 'strict $.Shape')
-- Output: Property cannot be found on the specified JSON path.

JSON_QUERY()

Returns a JSON fragment for the specified JSON path.

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

-- This returns NULL because the values of Cars is an array instead of a simple object
SELECT JSON_VALUE(@garage, '$.Cars') 
-- Output: NULL

-- Using JSON_QUERY() however returns the JSON string representation of our array object
SELECT JSON_QUERY(@garage, '$.Cars') 
-- Output: [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }]

-- This instance of JSON_VALUE() correctly returns a singular scalar value
SELECT JSON_VALUE(@garage, '$.Cars[0].Make')
-- Output: Volkswagen

-- Using JSON_QUERY will not work for returning scalar values - it only will return JSON strings for complex objects
SELECT JSON_QUERY(@garage, '$.Cars[0].Make')
-- Output: NULL

This is useful to help filter an array and then extract values with JSON_VALUE():

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

-- We use JSON_QUERY to get the JSON representation of the Cars array
SELECT JSON_QUERY(@garage, '$.Cars')
-- Output: [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }]

-- If we combine it with JSON_VALUE we can then pull out specific scalar values
SELECT JSON_VALUE(JSON_QUERY(@garage, '$.Cars') , '$[0].Make')
-- Output: Volkswagen

OPEN_JSON()

Returns a SQL result set for the specified JSON path. The result set includes columns identifying the datatypes of the parsed data.

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

SELECT * FROM OPENJSON(@garage, '$.Cars') -- Displaying the values of our "Cars" array.  We additionally get the order of the JSON objects outputted in the "key" column and the JSON object datatype in the "type" column
/* Output:
key    value                                                                                                                                type
------ ------------------------------------------------------------------------------------------------------------------------------------ ----
0      { "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }        5
1      { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }    5
*/

SELECT * FROM OPENJSON(@garage, '$.Cars[0]') -- Specifying the first element in our JSON array.  JSON arrays are zero-index based
/* Output:
key              value                                 type
---------------- ------------------------------------- ----
Make             Volkswagen                            1
Model            { "Base": "Golf", "Trim": "GL" }      5
Year             2003                                  2
PurchaseDate     2006-10-05T00:00:00.000Z              1
*/

SELECT * FROM OPENJSON(@garage, '$.Cars[0].Model') -- Pulling the Model property from the first element in our Cars array
/* Output:
key     value   type
------- ------- ----
Base    Golf    1
Trim    GL      1
*/

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

-- Here we retrieve the Make of each vehicle in our Cars array
SELECT JSON_VALUE(value, '$.Make') FROM OPENJSON(@garage, '$.Cars') 
/* Output: 
------------
Volkswagen
Subaru
*/ 

-- Parsing and converting some JSON dates to SQL DateTime2
SELECT CAST(JSON_VALUE(value, '$.PurchaseDate') as datetime2) FROM OPENJSON(@garage, '$.Cars') 
/* Output: 
---------------------------
2006-10-05 00:00:00.0000000
2015-08-18 00:00:00.0000000
*/ 

-- We can also format the output schema of a JSON string using the WITH option.  This is especially cool because we can bring up values from sub-arrays (see Model.Base and Model.Trim) to our top-level row result
SELECT * FROM OPENJSON(@garage, '$.Cars')
 WITH (Make varchar(20) 'strict $.Make',  
       ModelBase nvarchar(100) '$.Model.Base',
       ModelTrim nvarchar(100) '$.Model.Trim',
        Year int '$.Year',  
       PurchaseDate datetime2 '$.PurchaseDate') 
/* Output: 
Make           ModelBase   Year        PurchaseDate
-------------- ----------- ----------- ---------------------------
Volkswagen     Golf        2003        2006-10-05 00:00:00.0000000
Subaru         Impreza     2016        2015-08-18 00:00:00.0000000
*/

Creating JSON

Creating JSON data from either strings or result sets.

FOR JSON AUTO

Automatically creates a JSON string from a SELECT statement. Quick and dirty.

-- Create our table with test data
DROP TABLE IF EXISTS ##Garage;
CREATE TABLE ##Garage
(
    Id int IDENTITY(1,1),
    Make varchar(100),
    BaseModel varchar(50),
    Trim varchar(50),
    Year int,
    PurchaseDate datetime2
);
INSERT INTO ##Garage VALUES ('Volkswagen', 'Golf', 'GL', 2003, '2006-10-05');
INSERT INTO ##Garage VALUES ('Subaru', 'Impreza', 'Premium', 2016, '2015-08-18');

-- Take a look at our data
SELECT * FROM ##Garage;

-- AUTO will format a result into JSON following the same structure of the result set
SELECT Make, BaseModel, Trim, Year, PurchaseDate
FROM ##Garage
FOR JSON AUTO;
-- Output: [{"Make":"Volkswagen","BaseModel":"Golf","Trim":"GL","Year":2003,"PurchaseDate":"2006-10-05T00:00:00"},{"Make":"Subaru","BaseModel":"Impreza","Trim":"Premium","Year":2016,"PurchaseDate":"2015-08-18T00:00:00"}]

-- Using aliases will rename JSON keys
SELECT Make AS [CarMake] 
FROM ##Garage 
FOR JSON AUTO;
-- Output: [{"CarMake":"Volkswagen"},{"CarMake":"Subaru"}]

-- Any joined tables will get created as nested JSON objects.  The alias of the joined tables becomes the name of the JSON key
SELECT g1.Make,  Model.BaseModel as Base, Model.Trim, g1.Year, g1.PurchaseDate
FROM ##Garage g1
INNER JOIN ##Garage Model on g1.Id = Model.Id
FOR JSON AUTO;
-- Output: [{"Make":"Volkswagen","Year":2003,"PurchaseDate":"2006-10-05T00:00:00","Model":[{"Base":"Golf","Trim":"GL"}]},{"Make":"Subaru","Year":2016,"PurchaseDate":"2015-08-18T00:00:00","Model":[{"Base":"Impreza","Trim":"Premium"}]}]

-- Finally we can encapsulate our entire JSON result in a parent element by specifiying the ROOT option
SELECT Make, BaseModel, Trim, Year, PurchaseDate
FROM ##Garage
FOR JSON AUTO, ROOT('Cars');
-- Output: {"Cars":[{"Make":"Volkswagen","BaseModel":"Golf","Trim":"GL","Year":2003,"PurchaseDate":"2006-10-05T00:00:00"},{"Make":"Subaru","BaseModel":"Impreza","Trim":"Premium","Year":2016,"PurchaseDate":"2015-08-18T00:00:00"}]}

FOR JSON PATH

Formats a SQL query into a JSON string, allowing the user to define structure and formatting.

-- PATH will format a result using dot syntax in the column aliases.  Here's an example with just default column names
SELECT Make, BaseModel, Trim, Year, PurchaseDate
FROM ##Garage
FOR JSON PATH, ROOT('Cars');
-- Output: {"Cars":[{"Make":"Volkswagen","BaseModel":"Golf","Trim":"GL","Year":2003,"PurchaseDate":"2006-10-05T00:00:00"},{"Make":"Subaru","BaseModel":"Impreza","Trim":"Premium","Year":2016,"PurchaseDate":"2015-08-18T00:00:00"}]}

-- And here is the same example, just assigning aliases to define JSON nested structure
SELECT Make, BaseModel as [Model.Base], Trim AS [Model.Trim], Year, PurchaseDate
FROM ##Garage
FOR JSON PATH, ROOT('Cars');
-- Output: {"Cars":[{"Make":"Volkswagen","Model":{"Base":"Golf","Trim":"GL"},"Year":2003,"PurchaseDate":"2006-10-05T00:00:00"},{"Make":"Subaru","Model":{"Base":"Impreza","Trim":"Premium"},"Year":2016,"PurchaseDate":"2015-08-18T00:00:00"}]}

-- We can actually go multiple levels deep with this type of alias dot notation nesting
SELECT Make, BaseModel as [Model.Base], Trim AS [Model.Trim], 'White' AS [Model.Color.Exterior], 'Black' AS [Model.Color.Interior], Year, PurchaseDate
FROM ##Garage
FOR JSON PATH, ROOT('Cars');
-- Output: {"Cars":[{"Make":"Volkswagen","Model":{"Base":"Golf","Trim":"GL","Color":{"Exterior":"White","Interior":"Black"}},"Year":2003,"PurchaseDate":"2006-10-05T00:00:00"},{"Make":"Subaru","Model":{"Base":"Impreza","Trim":"Premium","Color":{"Exterior":"White","Interior":"Black"}},"Year":2016,"PurchaseDate":"2015-08-18T00:00:00"}]}

-- Concatenating data rows with UNION or UNION ALL just adds the row as a new element as part of the JSON array
SELECT Make,  BaseModel AS [Model.Base], Trim AS [Model.Trim], Year, PurchaseDate
FROM ##Garage WHERE Id = 1
UNION ALL
SELECT Make,  BaseModel, Trim, Year, PurchaseDate
FROM ##Garage WHERE Id = 2
FOR JSON PATH, ROOT('Cars');
-- Output: {"Cars":[{"Make":"Volkswagen","Model":{"Base":"Golf","Trim":"GL"},"Year":2003,"PurchaseDate":"2006-10-05T00:00:00"},{"Make":"Subaru","Model":{"Base":"Impreza","Trim":"Premium"},"Year":2016,"PurchaseDate":"2015-08-18T00:00:00"}]}

-- We can even include our FOR JSON in our SELECT statement to generate JSON strings for each row of our result set
SELECT g1.*, (SELECT Make, BaseModel AS [Model.Base], Trim AS [Model.Trim], Year, PurchaseDate FROM ##Garage g2 WHERE g2.Id = g1.Id FOR JSON PATH, ROOT('Cars')) AS [Json]
FROM ##Garage g1
/* Output: 
Id  Make          BaseModel    Trim      Year    PurchaseDate                Json
--- ------------- ------------ --------- ------- --------------------------- --------------------------------------------------------------------------------------------------------------------------
1   Volkswagen    Golf         GL        2003    2006-10-05 00:00:00.0000000 {"Cars":[{"Make":"Volkswagen","Model":{"Base":"Golf","Trim":"GL"},"Year":2003,"PurchaseDate":"2006-10-05T00:00:00"}]}
2   Subaru        Impreza      Premium   2016    2015-08-18 00:00:00.0000000 {"Cars":[{"Make":"Subaru","Model":{"Base":"Impreza","Trim":"Premium"},"Year":2016,"PurchaseDate":"2015-08-18T00:00:00"}]}
*/

Modifying JSON

Updating, adding to, and deleting from JSON data.

JSON_MODIFY()

Allows the user to update properties and values, add properties and values, and delete properties and values (the delete is unintuitive, see below).

Modify:

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GL" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

-- I upgraded some features in my Volkswagen recently, technically making it equivalent to a "GLI" instead of a "GL".  
-- Let's update our JSON using JSON_MODIFY:
SET @garage = JSON_MODIFY(@garage, '$.Cars[0].Model.Trim', 'GLI')
SELECT @garage
-- Output: { "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }

Add:

-- See https://gist.github.com/bertwagner/356bf47732b9e35d2156daa943e049e9 for a formatted version of this JSON
DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }'

-- I decided to sell my Golf.  Let's add a new "SellDate" property to the JSON saying when I sold my Volkswagen.
-- If we use strict mode, you'll see we can't add SellDate because the key never existed before
--SELECT JSON_MODIFY(@garage, 'append strict $.Cars[0].SellDate', '2017-02-17T00:00:00.000Z')
-- Output: Property cannot be found on the specified JSON path.

-- However, in lax mode (default), we have no problem adding the SellDate
SELECT JSON_MODIFY(@garage, 'append lax $.Cars[0].SellDate', '2017-02-17T00:00:00.000Z')
-- Output: { "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" ,"SellDate":["2017-02-17T00:00:00.000Z"]}, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" }] }

-- After selling my Golf, I bought another car a few days later: A new Volkswagen Golf GTI.  Let's add it to our garge:
-- Note the use of JSON_QUERY; this is so our string is interpreted as a JSON object instead of a plain old string
SET @garage = JSON_MODIFY(@garage, 'append $.Cars', JSON_QUERY('{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }'))
SELECT @garage;
-- Output: { "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }

Delete property:

DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "PurchaseDate": "2006-10-05T00:00:00.000Z", "SellDate" : "2017-02-17T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }'

-- Let's remove the PurchaseDate property on my original Volkswagen Golf since it's not relevant anymore:
SET @garage = JSON_MODIFY(@garage, '$.Cars[0].PurchaseDate', NULL)
SELECT @garage
-- Output: { "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "SellDate" : "2017-02-17T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }

Delete from array (this is not intuitive, see my Microsoft Connect item to fix this: https://connect.microsoft.com/SQLServer/feedback/details/3120404/sql-modify-json-null-delete-is-not-consistent-between-properties-and-arrays)

DECLARE @garage nvarchar(1000) = N'{ "Cars": [{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GLI" }, "Year": 2003, "SellDate" : "2017-02-17T00:00:00.000Z" }, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }'

-- I realize it's not worth keeping the original Volkswagen in my @garage data any longer, so let's completely remove it.
-- Note, if we use NULL as per the MSDN documentation, we don't actually remove the first car element of the array - it just gets replaced with NULL
-- This is problematic if we expect the indexes of our array to shift by -1.
SELECT JSON_MODIFY(@garage, '$.Cars[0]', NULL)
-- Output: { "Cars": [null, { "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }

-- To truly delete it (and not have the NULL appear as the first item in the array) we have to convert to a rowset, select everything that's not the first row, aggregate the rows into a string (UGH) and then recreate as JSON.
-- This is incredibly ugly.  The STREAM_AGG() function in SQL vNext should make it a little cleaner, but why doesn't the JSON_MODIFY NULL syntax just get rid of the element in the array?
-- I have opened a Microsoft connect issue for this here: https://connect.microsoft.com/SQLServer/feedback/details/3120404 
SELECT JSON_QUERY('{ "Cars" : [' + 
        STUFF((
               SELECT   ',' + value
               FROM OPENJSON(@garage, '$.Cars') 
               WHERE [key] <> 0
               FOR XML PATH('')), 1, 1, '') + '] }')
-- Output: { "Cars" : [{ "Make": "Subaru", "Model": { "Base": "Impreza", "Trim": "Premium" }, "Year": 2016, "PurchaseDate": "2015-08-18T00:00:00.000Z" },{ "Make": "Volkswagen", "Model": { "Base": "Golf", "Trim": "GTI" }, "Year": 2017, "PurchaseDate": "2017-02-19T00:00:00.000Z" }] }

SQL JSON Performance Tuning

SQL JSON functions are already fast. Adding computed columns and indexes makes them extremely fast.

Computed Column JSON Indexes

JSON indexes are simply regular indexes on computed columns.

Add a computed column:

-- Car data source: https://github.com/arthurkao/vehicle-make-model-data
IF OBJECT_ID('dbo.Cars') IS NOT NULL 
BEGIN
    DROP TABLE dbo.Cars;
END
CREATE TABLE dbo.Cars
(
    Id INT IDENTITY(1,1),
    CarDetails NVARCHAR(MAX)
);
-- See https://gist.github.com/bertwagner/1df2531676112c24cd1ab298fc750eb2 for the full untruncated version of this code
DECLARE @cars nvarchar(max) = '[ {"year":2001,"make":"ACURA","model":"CL"}, {"year":2001,"make":"ACURA","model":"EL"},...]';

INSERT INTO dbo.Cars (CarDetails)
SELECT value FROM OPENJSON(@cars, '$');

SELECT * FROM dbo.Cars;
/* 
Output:
Id          CarDetails
----------- ----------------------------------------------
1           {"year":2001,"make":"ACURA","model":"CL"}
2           {"year":2001,"make":"ACURA","model":"EL"}
3           {"year":2001,"make":"ACURA","model":"INTEGRA"}
...
*/

-- Remember to turn on "Include Actual Execution Plan" for all of these examples

-- Before we add any computed columns/indexes, let's see our execution plan for our SQL statement with a JSON predicate
SELECT * FROM dbo.Cars WHERE JSON_VALUE(CarDetails, '$.model') = 'Golf'
/*
Output:
Id          CarDetails
----------- --------------------------------------------------
1113        {"year":2001,"make":"VOLKSWAGEN","model":"GOLF"}
2410        {"year":2002,"make":"VOLKSWAGEN","model":"GOLF"}
3707        {"year":2003,"make":"VOLKSWAGEN","model":"GOLF"}
...
*/
-- The execution plan shows a Table Scan, not very efficient

-- We can now add a non-persisted computed column for our "model" JSON property.
ALTER TABLE dbo.Cars
ADD CarModel AS JSON_VALUE(CarDetails, '$.model');

-- We add the distinct to avoid parameter sniffing issues.  
-- Our execution plan now shows the extra computation that is occuring for every row of the table scan.
SELECT DISTINCT * FROM dbo.Cars WHERE JSON_VALUE(CarDetails, '$.model') = 'Golf'
SELECT DISTINCT * FROM dbo.Cars WHERE CarModel = 'Golf'

Add an index to our computed column:

-- Add an index onto our computed column
CREATE CLUSTERED INDEX CL_CarModel ON dbo.Cars (CarModel)

-- Check the execution plans again
SELECT DISTINCT * FROM dbo.Cars WHERE JSON_VALUE(CarDetails, '$.model') = 'Golf'
SELECT DISTINCT * FROM dbo.Cars WHERE CarModel = 'Golf'
-- We now get index seeks!

-- Indexed computed column returns results in ~1ms
SELECT * FROM dbo.Cars WHERE CarModel = 'Golf'

Performance test:

-- Turn on stats and see how long it takes to parse the ~20k JSON array elements
SET STATISTICS TIME ON

-- Test #1
-- Test how long it takes to parse each property from all ~20k elements from the JSON array
-- SQL returns this query in ~546ms
SELECT JSON_VALUE(value, '$.year') AS [Year], JSON_VALUE(value, '$.make') AS Make, JSON_VALUE(value, '$.model') AS Model FROM OPENJSON(@cars, '$') 

-- Test #2
-- Time to deserialize and query just Golfs without computed column + index
-- This takes ~255ms in SQL Server
SELECT * FROM OPENJSON(@cars, '$') WHERE JSON_VALUE(value, '$.model') = 'Golf'

-- Test #3
-- Time it takes to compute the same query for Golf's with a computed column and clustered index 
-- This takes ~1ms on SQL Server
SELECT * FROM dbo.Cars WHERE CarModel = 'Golf'

-- Test #4
-- Serializing data on SQL Server takes ~110ms
SELECT * FROM dbo.Cars FOR JSON AUTO

-- What about serializing/deserializing smaller JSON datasets?
-- Let's create our smaller set
DECLARE @carsSmall nvarchar(max) = '[ {"year":2001,"make":"ACURA","model":"CL"}, {"year":2001,"make":"ACURA","model":"EL"}, {"year":2001,"make":"ACURA","model":"INTEGRA"}, {"year":2001,"make":"ACURA","model":"MDX"}, {"year":2001,"make":"ACURA","model":"NSX"}, {"year":2001,"make":"ACURA","model":"RL"}, {"year":2001,"make":"ACURA","model":"TL"}]';

-- Test #5
-- Running our query results in the data becoming deserialized in ~0ms
SELECT JSON_VALUE(value, '$.year') AS [Year], JSON_VALUE(value, '$.make') AS Make, JSON_VALUE(value, '$.model') AS Model FROM OPENJSON(@carsSmall, '$') 
--30ms in sql

-- Test #6
-- And serialized in ~0ms
SELECT TOP 7  * FROM dbo.Cars FOR JSON AUTO