How To Use Temporal Tables For Easy Point-In-Time Analysis

Published on: 2017-05-23

Bordeaux, The Grand Theatre” by Stefano Montagner is licensed under CC BY-NC-ND 2.0

You can watch this blog post on YouTube too!

Have you ever needed to look at what data in a table used to look like?

If you have, it probably took a knuckle-cracking filled session of writing group-by statements, nested sub-queries, and window functions to write your time-travelling query.

Sorry for your lost day of productivity — I’ve been there too.

Fortunately for us, SQL Server 2016 introduces a new feature to make our point-in-time analysis queries easy to write: temporal tables.

Temporal Tables? Are Those The Same As Temporary Tables?

Don’t let the similar sounding name fool you: “temporal” <> “temporary”.

Temporal tables consist of two parts:

  1. The temporal table — this is the table that contains the current values of your data.
  2. The historical table — this table holds all of the previous values that at some point existed in your temporal table.

You might have created a similar setup yourself in previous versions of SQL using triggers. However, using a temporal table is different from this because:

  1. You don’t need to write any triggers/stored procedures! All of the history tracking is done automatically by SQL Server.
  2. Retrieving the data uses a simple WHERE clause — no complex querying required.

I want to make my life easier by using temporal tables! Take my money and show me how!

I’m flattered by your offer, but since we are good friends I’ll let you in on these secrets for free.

First let’s create a temporal table. I’m thinking about starting up a car rental business, so let’s model it after that:

IF OBJECT_ID('dbo.CarInventory', 'U') IS NOT NULL 
BEGIN
 -- When deleting a temporal table, we need to first turn versioning off
 ALTER TABLE [dbo].[CarInventory] SET ( SYSTEM_VERSIONING = OFF  ) 
 DROP TABLE dbo.CarInventory
 DROP TABLE dbo.CarInventoryHistory
END
CREATE TABLE CarInventory   
(    
 CarId INT IDENTITY PRIMARY KEY,
 Year INT,
 Make VARCHAR(40),
 Model VARCHAR(40),
 Color varchar(10),
 Mileage INT,
 InLot BIT NOT NULL DEFAULT 1,
 SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL,
 SysEndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL,
 PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime)     
)   
WITH 
( 
 SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.CarInventoryHistory)   
)

The key things to note with our new table above are that

  1. it contains a PRIMARY KEY.
  2. it contains two datetime2 fields, marked with GENERATED ALWAYS AS ROW START/END.
  3. It contains the PERIOD FOR SYSTEM_TIME statement.
  4. It contains the SYSTEM_VERSIONING = ON property with the (optional) historical table name (dbo.CarInventoryHistory).

If we query our newly created tables, you’ll notice our column layouts are identical:

SELECT * FROM dbo.CarInventory
SELECT * FROM dbo.CarInventoryHistory

Let’s fill it with the choice car of car rental agencies all across the U.S. — the Chevy Malibu:

INSERT INTO dbo.CarInventory (Year,Make,Model,Color,Mileage) VALUES(2017,'Chevy','Malibu','Black',0)
INSERT INTO dbo.CarInventory (Year,Make,Model,Color,Mileage) VALUES(2017,'Chevy','Malibu','Silver',0)
Although we got some unassuming car models, at least we can express our individuality with two different paint colors!

In all of the remaining screen shots, the top result is our temporal table dbo.CarInventory and the bottom result is our historical table dbo.CarInventoryHistory.

You’ll notice that since we’ve only inserted one row for each our cars, there’s no row history yet and therefore our historical table is empty.

Let’s change that by getting some customers and renting out our cars!

UPDATE dbo.CarInventory SET InLot = 0 WHERE CarId = 1
UPDATE dbo.CarInventory SET InLot = 0 WHERE CarId = 2

Now we see our temporal table at work: we updated the rows in dbo.CarInventory and our historical table was automatically updated with our original values as well as timestamps for how long those rows existed in our table.

After a while, our customers return their rental cars:

UPDATE dbo.CarInventory SET InLot = 1, Mileage = 73  WHERE CarId = 1
UPDATE dbo.CarInventory SET InLot = 1, Mileage = 488 WHERE CarId = 2
It’s totally possible for someone to have driven 73 or 488 miles in a Chevy Malibu in under 4 minutes…ever hear the phrase “drive it like a rental”?

Our temporal table show the current state of our rental cars: the customers have returned the cars back to our lot and each car has accumulated some mileage.

Our historical table meanwhile got a copy of the rows from our temporal table right before our last UPDATE statement. It’s automatically keeping track of all of this history for us!

Continuing on, business is going well at the car rental agency. We get another customer to rent our silver Malibu:

UPDATE dbo.CarInventory SET InLot = 0 WHERE CarId = 2

Unfortunately, our second customer gets into a crash and destroys our car:

DELETE FROM dbo.CarInventory WHERE CarId = 2
The customer walked away from the crash unscathed; the same can not be said for our profits.

With the deletion of our silver Malibu, our test data is complete.

Now that we have all of this great historically tracked data, how can we query it?

If we want to reminisce about better times when both cars were damage free and we were making money, we can write a query using SYSTEM_TIME AS OF to show us what our table looked like at that point in the past:

SELECT
 *
FROM 
 dbo.CarInventory
FOR SYSTEM_TIME AS OF '2017-05-18 23:49:50'
The good old days.

And if we want to do some more detailed analysis, like what rows have been deleted, we can query both temporal and historical tables normally as well:

-- Find the CarIds of cars that have been wrecked and deleted
SELECT DISTINCT
 h.CarId AS DeletedCarId
FROM
 dbo.CarInventory t
 RIGHT JOIN dbo.CarInventoryHistory h
  ON t.CarId = h.CarId 
WHERE 
 t.CarId IS NULL

C̶o̶l̶l̶i̶s̶i̶o̶n̶ Conclusion

Even with my car rental business not working out, at least we were able to see how SQL Server’s temporal tables helped us keep track of our car inventory data.

I hope you got as excited as I did the first time I saw temporal tables in action, especially when it comes to querying with FOR SYSTEM_TIME AS OF. Long gone are the days of needing complicated queries to rebuild data for a certain point in time.

 

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

XML vs JSON Shootout: Which is Superior in SQL Server 2016?

Published on: 2017-05-16

A duel is a duel” by Emanuele Rosso is licensed under CC BY-NC-ND 2.0

You can watch this blog post on YouTube too!

Additional performance comparisons available in an updated post.

Starting with the 2016 release, SQL Server offers native JSON support. Although the implementation is not perfect, I am still a huge fan.

Even if a new feature like JSON support is awesome, I am only likely to use it if it is practical and performs better than the alternatives.

Today I want to pit JSON against XML and see which is the better format to use in SQL Server.

Enter XML, SQL’s Bad Hombre

Full disclosure: I don’t love XML and I also don’t love SQL Server’s implementation of it.

XML is too wordy (lots of characters wasted on closing tags), it has elements AND attributes (I don’t like having to program for two different scenarios), and depending on what language you are programming in, sometimes you need schema files and sometimes you don’t.

SQL Server’s implementation of XML does have some nice features like a dedicated datatype that reduces storage space and validates syntax, but I find the querying of XML to be clumsy.

All XML grievances aside, I am still willing to use XML if it outperforms JSON. So let’s run some test queries!

Is JSON SQL Server’s New Sheriff in Town?

Although performance is the final decider in these comparison tests, I think JSON has a head start over XML purely in terms of usability. SQL Server’s JSON function signatures are easier to remember and cleaner to write on screen.

The test data I’m using is vehicle year/make/model data from https://github.com/arthurkao/vehicle-make-model-data. Here’s what it looks like once I loaded it into a table called dbo.XmlVsJson:

CREATE TABLE dbo.XmlVsJson
(
  Id INT IDENTITY PRIMARY KEY,
  XmlData XML,
  JsonData NVARCHAR(MAX)
)
(The full data query is available in this gist if you want to play along at home)

Data Size

So XML should be larger right? It’s got all of those repetitive closing tags?

SELECT
  DATALENGTH(XmlData)/1024.0/1024.0 AS XmlMB,
  DATALENGTH(JsonData)/1024.0/1024.0 AS JsonMB
FROM
  dbo.XmlVsJson

Turns out the XML is actually smaller! How can this be? This is the magic behind the SQL Server XML datatype. SQL doesn’t store XML as a giant string; it stores only the XML InfoSet, leading to a reduction in space.

The JSON on the other hand is stored as regular old nvarchar(max) so its full string contents are written to disk. XML wins in this case.

INSERT Performance

So XML is physically storing less data when using the XML data type than JSON in the nvarchar(max) data type, does that mean it will insert faster as well? Here’s our query that tries to insert 100 duplicates of the row from our first query:

SET STATISTICS TIME ON
INSERT INTO dbo.XmlVsJson (XmlData)
SELECT XmlData FROM dbo.XmlVsJson 
  CROSS APPLY 
  (
    SELECT DISTINCT number 
    FROM master..spt_values 
    WHERE number BETWEEN 1 AND 100
  )t WHERE Id = 1
GO
INSERT INTO dbo.XmlVsJson (JsonData)
SELECT JsonData FROM dbo.XmlVsJson 
  CROSS APPLY 
  (
    SELECT DISTINCT number 
    FROM master..spt_values 
    WHERE number BETWEEN 1 AND 100
  )t WHERE Id = 1
GO

And the results? Inserting the 100 XML rows took 613ms on my machine, while inserting the 100 JSON rows took 1305ms…XML wins again!

JSON ain’t looking too hot. Wait for it…

I’m guessing since the XML data type physically stores less data, it makes sense that it would also write it out to the table faster as well.

CRUD Operations

I’m incredibly impressed by SQL Server’s JSON performance when compared to .NET — but how does it compare to XML on SQL Server?

Read

Let’s select the fragment for our second car from our XML and JSON:

SELECT t.XmlData.query('/cars/car[2]') 
FROM dbo.XmlVsJson t 
WHERE Id = 1
SELECT JSON_QUERY(t.JsonData, '$.cars[1]') 
FROM dbo.XmlVsJson t 
WHERE Id = 1

Result? JSON wins (at 0ms vs 63ms for XML) when needing to pluck out a fragment from our larger object string.

What if we want to grab a specific value instead of a fragment?

SELECT t.XmlData.value('(/cars/car[2]/model)[1]', 'varchar(100)') FROM dbo.XmlVsJson t 
WHERE Id = 1
SELECT JSON_VALUE(t.JsonData, '$.cars[1].model') 
FROM dbo.XmlVsJson t 
WHERE Id = 1

Once again JSON wins with 0ms vs 11ms for XML.

If you look at the execution plans for these last two queries, it’s easy to see that XML has a lot more to do behind the scenes to retrieve the data:

XML:

JSON:

Create

We saw above that inserting rows of XML data is faster than inserting rows of JSON, but what if we want to insert new data into the object strings themselves? Here I want to insert the property “mileage” into the first car object:

UPDATE t SET XmlData.modify('
insert <mileage>100,000</mileage>
into (/cars/car[1])[1]') 
FROM dbo.XmlVsJson t 
WHERE Id = 1
UPDATE t SET JsonData = JSON_MODIFY(JsonData,
'$.cars[0].mileage','100,000') 
FROM dbo.XmlVsJson t 
WHERE Id = 1

In addition to the cleaner syntax (JSON_MODIFY() is essentially the same as a REPLACE()) the JSON insert runs in 22ms compared to the 206ms for XML. Another JSON win.

Update

Let’s update the mileage properties we just added to have values of 110,000:

UPDATE t SET XmlData.modify('
replace value of (/cars/car[1]/mileage/text())[1]
with     "110,000"') 
FROM dbo.XmlVsJson t
WHERE Id = 1
UPDATE t SET JsonData = JSON_MODIFY(JsonData, '$.cars[0].mileage','110,000') 
FROM dbo.XmlVsJson t
WHERE Id = 1

Result? JSON has the quicker draw and was able to perform this update in 54ms vs XML’s 194ms.

Delete

Deleting large string data, a DBA’s dream *snicker*.

Let’s delete the mileage property, undoing all of that hard work we just did:

UPDATE t SET XmlData.modify('
delete /cars/car[1]/mileage[1]') 
FROM dbo.XmlVsJson t 
WHERE Id = 1
UPDATE t SET JsonData = JSON_MODIFY(JsonData, '$.cars[0].mileage', null) 
FROM dbo.XmlVsJson t 
WHERE Id = 1

JSON doesn’t take any time to reload and wins against XML again 50ms to 159ms.

Read Part 2: Indexes

So above we saw that JSON was faster than XML at reading fragments and properties from a single row of serialized data. But our SQL Server’s probably have LOTS of rows of data — how well does indexed data parsing do in our match up?

First let’s expand our data — instead of storing all of our car objects in a single field, let’s build a new table that has each car on its own row:

(once again, full dataset at GitHub if you are playing along at home)

Now that we have our expanded data in our table, let’s add some indexes. The XML datatype in SQL Server has its own types of indexes, while JSON simply needs a computed column with a regular index applied to it.

DROP INDEX IF EXISTS PXML_XmlData ON XmlVsJson2
CREATE PRIMARY XML INDEX PXML_XmlData
ON XmlVsJson2 (XmlData);
ALTER TABLE dbo.XmlVsJson2
ADD MakeComputed AS JSON_VALUE(JsonData, '$.make')
CREATE NONCLUSTERED INDEX IX_JsonData ON dbo.XmlVsJson2 (MakeComputed)

(Note: I also tried adding an XML secondary index for even better performance, but I couldn’t get the query engine to use that secondary index on such a basic dataset)

If we try to find all rows that match a predicate:

SELECT Id, XmlData 
FROM dbo.XmlVsJson2 t 
WHERE t.XmlData.exist('/car/make[.="ACURA"]') = 1
SELECT Id, JsonData 
FROM dbo.XmlVsJson2 t 
WHERE JSON_VALUE(t.JsonData, '$.make') = 'ACURA'

XML is able to filter out 96 rows in 200ms and JSON accomplishes the same in 9ms. A final win for JSON.

Conclusion

If you need to store and manipulate serialized string data in SQL Server, there’s no question: JSON is the format of choice. Although JSON’s storage size is a little larger than its XML predecessor, SQL Server’s JSON functions outperform XML in speed in nearly all cases.

Is there enough performance difference to rewrite all of your old XML code to JSON? Probably not, but every case is different.

One thing is clear: new development should consider taking advantage of SQL Server’s new JSON functions.

 

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

One SQL Cheat Code For Amazingly Fast JSON Queries

Published on: 2017-05-09

How non-persisted computed column indexes make your JSON queries high performance

You can watch this blog post on YouTube too!

Recently I’ve been working with JSON in SQL Server 2016 a lot.

One of the hesitations many people have with using JSON in SQL Server is that they think that querying it must be really slow — SQL is supposed to excel at relational data, not string parsing right?

It turns out that performance is pretty good with the standalone SQL Server JSON functions. Even better is that it’s possible to make queries against JSON data run at ludicrous speeds by using indexes on JSON parsed computed columns. In this post I want to take a look at how SQL is able to parse* with such great performance.

*“Parse” here is actually a lie —it’s doing something else behind the scenes. You’ll see what I mean, keep reading!

Computed Columns in SQL Server

The only way to get JSON indexes working on SQL server is to use a computed column. A computed column is basically a column that performs a function to calculate its values.

For example, let’s say we have a table with some car JSON data in it:

DROP TABLE IF EXISTS dbo.DealerInventory;
CREATE TABLE dbo.DealerInventory
(
  Id int IDENTITY(1,1) PRIMARY KEY,
  Year int,
  JsonData nvarchar(300)
);
INSERT INTO dbo.DealerInventory (Year, JsonData) VALUES (2017, '{ "Make" : "Volkswagen", "Model" : "Golf" }');
INSERT INTO dbo.DealerInventory (Year, JsonData) VALUES (2017, '{ "Make" : "Honda", "Model" : "Civic" }');
INSERT INTO dbo.DealerInventory (Year, JsonData) VALUES (2017, '{ "Make" : "Subaru", "Model" : "Impreza" }');
SELECT * FROM dbo.DealerInventory;
/* Output:
Id    Year     JsonData
----- -------- ---------------------------------------------
1     2017     { "Make" : "Volkswagen", "Model" : "Golf" }
2     2017     { "Make" : "Honda", "Model" : "Civic" }
3     2017     { "Make" : "Subaru", "Model" : "Impreza" }
*/

We can add a new computed column to the table, “Make”, which parses and extracts the Make property from each row’s JSON string:

ALTER TABLE dbo.DealerInventory
ADD Make AS JSON_VALUE(JsonData, '$.Make');
SELECT * FROM dbo.DealerInventory;
/* Output:
Id Year  JsonData                                    Make
-- ----- ------------------------------------------- ----------
1  2017  { "Make" : "Volkswagen", "Model" : "Golf" } Volkswagen
2  2017  { "Make" : "Honda", "Model" : "Civic" }     Honda
3  2017  { "Make" : "Subaru", "Model" : "Impreza" }  Subaru
*/

By default, the above Make computed column is non-persisted, meaning its values are never stored to the database (persisted computed columns can also be created, but that’s a topic for a different time). Instead, every time a query runs against our dbo.DealerInventory table, SQL Server will calculate the value for each row.

The performance of this isn’t great — it’s essentially a scalar function running for each row of our output :(. However, when you combine a computed column with an index, something interesting happens.

Time to dive in with DBCC Page

DBCC Page is an undocumented SQL Server function that shows what the raw data stored in a SQL page file looks like. Page files are how SQL Server stores its data.

In the rest of this post we’ll be looking at how data pages (where the actual table data in SQL is stored) and index pages (where our index data is stored) are affected by non-persisted computed columns — and how they make JSON querying super fast.

First, let’s take a look at the existing data we have. We do this by first turning on trace flag 3604 and using DBCC IND to get the page ids of our data. Additional details on the column definitions in DBCC IND and DBCC PAGE can be found in Paul Randal’s blog post on the topic.

DBCC TRACEON(3604);
-- "Sandbox" is the name of my database
DBCC IND('Sandbox','dbo.DealerInventory',-1);

If you look at the results above, row 2 contains our data page (indicated by PageType = 1) and the PagePID of that page is 305088 (if you are playing along at home, your PagePID is most likely something else). If we then look up that PagePID using DBCC PAGE we get something like this:

DBCC PAGE('Sandbox',1,305088,3) WITH TABLERESULTS

You can see our three rows of data highlighted in red. The important thing to note here is that our computed column of the parsed “Make” value is truly non-persisted and no where to be found, meaning it has to get generated for every row during query execution.

Now, what if we add an index to our non-persisted computed column and then run DBCC IND again:

CREATE NONCLUSTERED INDEX IX_ParsedMake ON dbo.DealerInventory (Make)
DBCC IND('Sandbox','dbo.DealerInventory',-1);

You’ll now notice that in addition to data page 305088 (PageType = 1), we also have an index page 305096 (PageType = 2). If we examine both the data page and the index page we see something interesting:

DBCC PAGE('Sandbox',1,305088,3) WITH TABLERESULTS
DBCC PAGE('Sandbox',1,305096,3) WITH TABLERESULTS

Nothing has changed with our data page:

But our index page contains the parsed values for our “Make” column:

What does this mean? I thought non-persisted computed columns aren’t saved to disk!

Exactly right: our non-persisted computed column “Make” isn’t saved to the data page on the disk. However if we create an index on our non-persisted computed column, the computed value is persisted on the index page!

This is basically a cheat code for indexing computed columns.

SQL will only compute the “Make” value on a row’s insert or update into the table (or during the initial index creation) — all future retrievals of our computed column will come from the pre-computed index page.

This is how SQL is able to parse indexed JSON properties so fast; instead of needing to do a table scan and parsing the JSON data for each row of our table, SQL Server can go look up the pre-parsed values in the index and return the correct data incredibly fast.

Personally, I think this makes JSON that much easier (and practical) to use in SQL Server 2016. Even though we are storing large JSON strings in our database, we can still index individual properties and return results incredibly fast.

 

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!