Inverted Polygons? How to Troubleshoot SQL Server's Left Hand Rule

inverted-polygon

Last week we looked at how easy it is to import GeoJSON data into SQL Server's geography datatype.

Sometimes your source data won't be perfectly formatted for SQL Server's spatial datatypes though.

Today we'll examine what to do when our geographical polygon is showing us inverted results.

Watch this week's video on YouTube

Colorado Is A Rectangle

If you look at the state of Colorado on a map, you'll notice its border is pretty much a rectangle.

Roughly marking the lat/long coordinates of the state's four corners will give you a polygon comprised of the following points:

colorado-map-lat-longs-1

Or in GeoJSON format (set equal to a SQL variable) you might represent this data like so:

DECLARE @Colorado nvarchar(max) = N'
{
    "type": "FeatureCollection",
    "features": [{
        "type": "Feature",
        "properties": {},
        "geometry": {
            "type": "Polygon",
            "coordinates": [
                [
                    [-109.05005693435669,
                        41.0006946252774
                    ],
                    [-102.05157816410065,
                        41.002362600596015
                    ],
                    [-102.0421314239502,
                        36.993139985820925
                    ],
                    [-109.04520750045776,
                        36.99898824162522
                    ],

                    [-109.05005693435669,
                        41.0006946252774
                    ]
                ]
            ]
        }
    }]
}
'

Note: four points + one extra point that is a repeat of our first point - this last repeated point let's us know that we have a closed polygon since it ends at the same point where it began.

Viewing Our Colorado Polygon

Converting this array of points to the SQL Server geography datatype is pretty straight forward:

SELECT
    geography::STPolyFromText(
        'POLYGON ((' + STRING_AGG(CAST(Long + ' ' + Lat as varchar(max)), ',') + '))'
        ,4326) AS StateBoundary
FROM
    (
    SELECT 
        Long,
        Lat
    FROM
        OPENJSON(@Colorado, '$.features[0].geometry.coordinates[0]')
        WITH
            (
                Long varchar(100) '$[0]',
                Lat varchar(100) '$[1]'
            )
)d

We can then take a look at SQL Server Management Studio's Spatial Results tab and see our polygon of Colorado drawn on a map.  You might notice something looks a little funny with this picture though:

spatial-results-inverted-rectangle

WHY IS MY POLYGON AREA INVERTED?!!??!

Discerning eyes might notice that SQL Server didn't shade in the area inside of the polygon - it instead shaded in everything in the world EXCEPT for the interior of our polygon.

If this is the first time you've encountered this behavior then you're probably confused by this behavior - I know I was.

The Left-Hand/Right-Hand Rules

There is a logical explanation though for why SQL Server is seemingly shading in the wrong part of our polygon.

SQL Server's geography datatype follows the "left-hand rule" when determining which side of the polygon should be shaded.  On the contrary, the GeoJSON specification specifies objects should be formed following the "right-hand rule."

The left hand rule works like this: imagine you are walking the path of polygon - whatever is to the left of the line you are walking is what is considered the "interior" of that polygon.

So if we draw arrows that point in the direction that the coordinates are listed in our GeoJSON, you'll notice we are making our polygon in a clockwise direction:

arrows-2

If you imagine yourself walking along this line in the direction specified, you'll quickly see why SQL Server shades the "outside" of the polygon: following the left-hand rule, everything except for the state of Colorado is considered the interior of our polygon shape.

Reversing Polygon Direction

So the problem here is that our polygon data was encoded in a different direction than the SQL Server geography datatype expects.

One way to fix this is to correct our source data by reordering the points so that the polygon is drawn in a counter-clockwise direction:

-- Note: The middle three sets of points have been included in reverse order while the first/last point have stayed the same
DECLARE @ColoradoReversed nvarchar(max) = N'
{
    "type": "FeatureCollection",
    "features": [{
        "type": "Feature",
        "properties": {},
        "geometry": {
            "type": "Polygon",
            "coordinates": [
                [
                    [-109.05005693435669,
                        41.0006946252774
                    ],
                    [-109.04520750045776,
                        36.99898824162522
                    ],
                    [-102.0421314239502,
                        36.993139985820925
                    ],

                    [-102.05157816410065,
                        41.002362600596015
                    ],

                    [-109.05005693435669,
                        41.0006946252774
                    ]
                ]
            ]
        }
    }]
}
'

This is pretty easy to do with a polygon that only has five points, but this would be a huge pain for a polygon with hundreds or thousands of points.

So how do we solve this in a more efficient manner?

Easy, use SQL Server's ReorientObject() function.

SELECT
    geography::STPolyFromText(
        'POLYGON ((' + STRING_AGG(CAST(Long + ' ' + Lat as varchar(max)), ',') + '))'
        ,4326).ReorientObject() AS StateBoundary
FROM
    (
    SELECT 
        Long,
        Lat
    FROM
        OPENJSON(@Colorado, '$.features[0].geometry.coordinates[0]')
        WITH
            (
                Long varchar(100) '$[0]',
                Lat varchar(100) '$[1]'
            )
)d

ReorientObject() does what we did manually above - it manipulates the order of our polygon's points so that it changes the direction in which the polygon is drawn.

Note: SQL uses a different order when reversing the points using ReorientObject() than the way we reversed them above.  The end result ends up being the same however.

Regardless of which method you choose to use, the results are the same: our polygon of Colorado is now drawn in the correct direction and the Spatial Results tab visually confirms this for us:

spatial-correct-results

Importing GeoJSON Earthquake Data Into SQL Server

IMG_0397 A significant portion of Yellowstone National Park sits on top of a supervolcano.  Although it's not likely to erupt any time soon, the park is constantly monitored for geological events like earthquakes.

This week I want to take a look at how you can import this earthquake data, encoded in GeoJSON format, into SQL Server in order to be able to analyze it using SQL Server's spatial functions.

Watch this week's video on YouTube

GeoJSON

The source for the data we'll be using is the 30-day earthquake feed from the USGS.  This data is encoded in the GeoJSON format, a specification that makes it easy to share spatial data via JSON.  To get an idea of how it looks, here's an extract:

{
    "type": "FeatureCollection",
    "metadata": {
        "generated": 1515603955000,
        "url": "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/1.0_month.geojson",
        "title": "USGS Magnitude 1.0+ Earthquakes, Past Month",
        ...
    },
    "features": [{
            "type": "Feature",
            "properties": {
                "mag": 1.25,
                "place": "7km NW of The Geysers, CA",
                "time": 1515603166510,
                ...
            },
            "geometry": {
                "type": "Point",
                "coordinates": [-122.8221664, 38.8175011, 2.02]
            },
            "id": "nc72952110"
        },
        ...
        ],
    "bbox": [-179.921, -62.519, -3.35, 179.9551, 85.3951, 608.58]
}

The key thing we'll be examining in this data is the "features" array: it contains one feature object for each earthquake that's been recorded in the past 30 days.  You can see the "geometry" child object contains lat/long coordinates that we'll be importing into SQL Server.

If you want the same 30-day GeoJSON extract we'll be using in all of the following demo code, you can download it here.

Importing GeoJSON into SQL Server

There's no out of the box way to import GeoJSON data into SQL Server.

However, using SQL Server's JSON functions we can build our own solution pretty easily.

First, let's create a table where we can store all of earthquake data:

CREATE TABLE dbo.EarthquakeData
(
    Id int IDENTITY PRIMARY KEY,
    EventDate DATETIME2,
    Magnitude FLOAT,
    Place VARCHAR(300),
    Coordinates GEOGRAPHY,
    Long varchar(100),
    Lat varchar(100)
)

CREATE SPATIAL INDEX IX_Coordinates ON dbo.EarthquakeData (Coordinates)

Then, let's use the OPENJSON() function to parse our JSON and insert it into our table:

DECLARE @GeoJSON nvarchar(max) = N'{...}' -- Put the GeoJSON here

INSERT INTO dbo.EarthquakeData (EventDate,Magnitude,Place,Coordinates,Long,Lat)
SELECT 
    -- Convert unix epoch time to datetime
    -- We also know the source is in UTC, so we specify that using AT TIME ZONE for easier conversions down the road
    DATEADD(second,cast(UnixMillisecondsSinceEpoch/1000 as int),'1970-01-01 00:00:00') AT TIME ZONE 'UTC' AS EventDate,
    Magnitude,
    Place,
    -- Build our geography datatype
    geography::STPointFromText('POINT ('+Long + ' ' + Lat + ')', 4326) AS Geography,
    Long,
    Lat
FROM 
    OPENJSON(@GeoJSON, '$.features')
        WITH (
            UnixMillisecondsSinceEpoch bigint '$.properties.time',
            Magnitude float '$.properties.mag',
            Place varchar(300) '$.properties.place',
            Long varchar(100) '$.geometry.coordinates[0]',
            Lat varchar(100) '$.geometry.coordinates[1]'
        )

We use OPENJSON() to parse our JSON hierarchy and then concatenate together the lat and long values into our well known text format to be able to use it with SQL Server's spatial function STPointFromText:

SELECT geography::STPointFromText('POINT ('+Long + ' ' + Lat + ')', 4326) AS Geography

What results is our earthquake data all nicely parsed out into our dbo.EarthquakeData table:

Earthquake-data-parsed

What about Yellowstone?

The above data includes earthquakes from around world.  Since we only want to examine earthquakes in Yellowstone, we'll need to filter the data out.

There's a handy Place column in the data that we could probably add a LIKE '%yellowstone%' filter to - but this is a post about spatial data in SQL, we can do better!

The Wyoming State Geological Survey website has Shapefiles for the boundary of Yellowstone National Park.  Since we are practicing our GeoJSON import skills, I converted the Shapefiles to GeoJSON using an online converter and the resulting data looks like this:

{
    "type": "FeatureCollection",
    "bbox": [-111.15354785438899, 44.13238494057162, -109.816111731858, 45.102865336505396],
    "features": [{
        "type": "Feature",
        "geometry": {
            "type": "Polygon",
            "coordinates": [
                [
                    [-111.0970801722779, 44.487322588834374],
                    [-111.09707931336956, 44.48929779632266],
                    [-111.09707877845554, 44.49109904485493],
                    [-111.09707888353985, 44.49472122457225],.
                    ...
                ]
            ]
        },
        ...
    }]
}

You can download the full park boundary GeoJSON file here.

Just like before, we'll use SQL Server's OPENJSON() function to parse our GeoJSON data into a well-known text POLYGON.

First we create our table:

DROP TABLE IF EXISTS dbo.ParkBoundaries
CREATE TABLE dbo.ParkBoundaries
(
    Id int IDENTITY PRIMARY KEY,
    ParkName varchar(100),
    ParkBoundary GEOGRAPHY
)
CREATE SPATIAL INDEX IX_Boundary ON dbo.ParkBoundaries (ParkBoundary)

And then populate it, this time using the STPolyFromText spatial function:

INSERT INTO dbo.ParkBoundaries (ParkName, ParkBoundary)
SELECT
    'Yellowstone National Park' AS ParkName,
    geography::STPolyFromText('POLYGON ((' + STRING_AGG(CAST(Long + ' ' + Lat as varchar(max)), ',') + '))',4326).ReorientObject() AS ParkBoundary
FROM
    (
    SELECT 
        Long,
        Lat
    FROM
        OPENJSON(@Yellowstone, '$.features[0].geometry.coordinates[0]')
        WITH
            (
                Long varchar(100) '$[0]',
                Lat varchar(100) '$[1]'
            )
)d

Filtering our data

Now we have two tables: dbo.EarthquakeData and dbo.ParkBoundaries.  What we want to do is select only the Earthquake data points that fall within the boundaries of Yellowstone National Park.

This is easy to do using the STIntersects spatial function, which returns a "1" for any rows where one geography instance (our lat/long earthquake coordinate) intersects another geography instance (our park boundary):

SELECT 
    d.Id,
    d.Magnitude,
    d.EventDate,
    d.Place,
    b.ParkName
FROM 
    dbo.EarthQuakeData d
    CROSS JOIN dbo.ParkBoundaries b
WHERE
    Coordinates.STIntersects(ParkBoundary) =1
    AND b.ParkName = 'Yellowstone National Park'
ORDER BY
    Magnitude DESC

The rest is up to you

So all it takes to import GeoJSON data into SQL Server is knowing how to use SQL Server's JSON functions.

Once geographical data is imported into geography data types, SQL Server's spatial functions offer lots of flexibility for how to efficiently slice and dice the data.

How To Create Multi-Object JSON Arrays in SQL Server

blog-image

Recently I was discussing with Peter Saverman whether it would be possible to take some database tables that look like this:

2017-12-16_10-34-48

And output them so that the Cars and Toys data would map to a multi-object JSON array like so:

2017-12-16_10-38-51

Watch this week's video on YouTube

Why would you ever need this?

If you are coming from a pure SQL background, at this point you might be wondering  why you would ever want create an object array that contains mixed object types.  Well, from an application development standpoint this type of scenario can be fairly common.

In a database, it makes sense to divide Home and Car and Toy into separate tables.  Sure, we could probably combine the latter two with some normalization, but imagine we will have many different types of entities that will be more difficult to normalize - sometimes it just makes sense to store this information separately.

Not to mention that performing analytical type queries across many rows of data will typically be much faster stored in this three table format.

The three table layout, while organized from a database standpoint, might not be the best way to organize the data in an object-oriented application.  Usually in a transaction oriented application, we want our data to all be together as one entity.  This is why NoSQL is all the rage among app developers.  Having all of your related data all together makes it easy to manage, move, update, etc...  **This is where the array of multi-type objects comes in - it'd be pretty easy to use this structure as an array of dynamic or inherited objects inside of our application.

Why not just combine these Car and Toy entities in app?

Reading the data into the app through multiple queries and mapping that data to objects is usually the first way you would try doing something like this.

However, depending on many different variables, like the size of the data, the number of requests, the speed of the network, the hardware the app is running on, etc... mapping your data from multiple queries might not be the most efficient way to go.

On the other hand, if you have a big beefy SQL Server available that can do those transformations for you, and you are willing to pay for the processing time on an \$8k/core enterprise licensed machine, then performing all of the these transformations on your SQL Server is the way to go.

The solution

UPDATE: Jovan Popovic suggested an even cleaner solution using CONCAT_WS.  See the update at the bottom of this post.

First, here's the data if you want to play along at home:

DROP TABLE IF EXISTS ##Home;
GO
DROP TABLE IF EXISTS ##Car;
GO
DROP TABLE IF EXISTS ##Toy;
GO

CREATE TABLE ##Home
(
    HomeId int IDENTITY PRIMARY KEY,
    City nvarchar(20),
    State nchar(2)
);
GO

CREATE TABLE ##Car
(
    CarId int IDENTITY PRIMARY KEY,
    HomeId int,
    Year smallint,
    Make nvarchar(20),
    Model nvarchar(20),
    FOREIGN KEY (HomeId) REFERENCES ##Home(HomeId)
);
GO

CREATE TABLE ##Toy
(
    ToyId int IDENTITY PRIMARY KEY,
    HomeId int,
    Category nvarchar(20),
    RiderCapacity int,
    FOREIGN KEY (HomeId) REFERENCES ##Home(HomeId)
);
GO

INSERT INTO ##Home (City,State) VALUES ('Cleveland','OH')
INSERT INTO ##Home (City,State) VALUES ('Malibu','CA')

INSERT INTO ##Car (HomeId,Year, Make, Model) VALUES ('1','2017', 'Volkswagen', 'Golf')
INSERT INTO ##Car (HomeId,Year, Make, Model) VALUES ('2','2014', 'Porsche', '911')

INSERT INTO ##Toy (HomeId,Category, RiderCapacity) VALUES ('1','Bicycle', 1)
INSERT INTO ##Toy (HomeId,Category, RiderCapacity) VALUES ('2','Kayak', 2)

SELECT * FROM ##Home
SELECT * FROM ##Car
SELECT * FROM ##Toy

And here's the query that does all of the transforming:

SELECT 
    h.HomeId,
    h.City,
    h.State,
    GarageItems = JSON_QUERY('[' + STRING_AGG( GarageItems.DynamicData,',') + ']','$')
FROM
    ##Home h
    INNER JOIN
    (
        SELECT
            HomeId,
            JSON_QUERY(Cars,'$') AS DynamicData
        FROM
            ##Home h
            CROSS APPLY
            (
            SELECT 
                (
                SELECT  
                    *
                FROM
                    ##Car c
                WHERE
                    c.HomeId = h.HomeId
                    FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
                ) AS Cars
            ) d 
        UNION ALL
        SELECT
            HomeId,
            JSON_QUERY(Cars,'$') AS DynamicData
        FROM
            ##Home h
            CROSS APPLY
            (
            SELECT 
                (
                SELECT  
                    *
                FROM
                    ##Toy c
                WHERE
                    c.HomeId = h.HomeId
                    FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
                ) AS Cars
            ) d
    ) GarageItems
        ON h.HomeId = GarageItems.HomeId
GROUP BY
    h.HomeId,
    h.City,
    h.State

There are a couple of key elements that make this work.

CROSS APPLY

When using FOR JSON PATH , ALL rows and columns from that result set will get converted to a single JSON string.

This creates a problem if, for example, you want to have a column for your JSON string and a separate column for something like a foreign key (in our case, HomeId).  Or if you want to generate multiple JSON strings filtered on a foreign key.

The way I chose to get around this is to use CROSS APPLY with a join back to our Home table - this way we get our JSON string for either Cars or Toys created but then output it along with some additional columns.

WITHOUT_ARRAY_WRAPPER

When using FOR JSON PATH to turn a result set into a JSON string, SQL Server will automatically add square brackets around the JSON output as if it were an array.

This is a problem in our scenario because when we use FOR JSON PATH to turn the Car and Toy table into JSON strings, we eventually want to combine them together into the same array instead of two separate arrays.  The solution to this is using the WITHOUT_ARRAY_WRAPPER option to output the JSON string without the square brackets.

Conclusion

Your individual scenario and results may vary.  This solution was to solve a specific scenario in a specific environment.

Is it the right way to go about solving your performance problems all of the time? No.  But offloading these transformations onto SQL Server is an option to keep in mind.

Just remember - always test to make sure your performance changes are actually helping.

UPDATED Solution Using CONCAT_WS:

This solution recommended by Jovan Popovic is even easier than above.  It requires using CONCAT_WS, which is available starting in SQL Server 2017 (the above solution requires STRING_AGG which is also in 2017, but it could be rewritten using FOR XML string aggregation if necessary for earlier versions)

SELECT h.*,
'['+ CONCAT_WS(',',
(SELECT * FROM ##Car c WHERE c.HomeId = h.HomeId FOR JSON PATH, WITHOUT_ARRAY_WRAPPER),
(SELECT * FROM ##Toy t WHERE t.HomeId = h.HomeId FOR JSON PATH, WITHOUT_ARRAY_WRAPPER)
)
+ ']'
FROM ##Home h

5 Things You Need to Do When Performance Testing JSON in SQL and C#

1ce87-1opx0seatjtntutc1qi1txg

Watch this week's video on YouTube

Want to learn more about using JSON in SQL Server? Watch me present at the online GroupBy conference on June 9, 2017 at 8am.

I've written a few articles this year about how awesome JSON performance is in SQL Server 2016.

The more I continue to use JSON in SQL Server, the more impressed I become with its speed and versatility. Over time I've learned new techniques and realize that JSON in SQL Server is actually much faster than I initially thought.

Today I want to correct some performance tests where I think I unfairly compared SQL Server JSON performance the first time around.

Major thanks to @JovanPop_MSFT for his help with performance testing suggestions.

Performance testing is hard

Before I dive into the performance tests, I want to be clear that these tests are still not perfect.

Performance testing in SQL Server is hard enough. When you start trying to compare SQL Server functions to code in .NET, lots of of other factors come in to play.

I'll try to to highlight where there still might be some problems with my methodology in the tests below, but overall I think these tests are more accurate comparisons of these features.

SQL Server JSON vs. Json.Net

There are two major issues with comparing SQL Server JSON functions to Json.NET functions in C#:

  1. Queries running in SQL Server Management Studio have significant overhead when rendering results to the results grid.
  2. The way SQL Server retrieves pages of data from disk or memory is not the same as how C# retrieves data from disk or memory.

The below tests should provide a more accurate comparison between SQL Server and .NET.

I am capturing SQL run times for the below tests using SET STATISTICS TIME ON. All of the test data for the below tests is available here: https://gist.github.com/bertwagner/f0645cf1b244af7d6bb75856db8744e0

Test #1 — Deserializing 20k JSON elements

For this first test, we will deserialize ~20k rows of car year-make-model data comparing the SQL Server OPENJSON function against Json.NET's DeserializeObject.

Previously this test used JSON_VALUE which was adding unnecessary processing overhead. With the query rewritten to run more efficiently, it looks like this:

SELECT year, make, model
FROM OPENJSON(@cars) WITH (year int, make nvarchar(50), model nvarchar(50));
-- 160ms

Now the problem with this query is that we are still drawing all ~20k rows of data to the screen in SQL Server Management Studio. The best way to avoid this extra processing is to simply convert the query to use COUNT:

SELECT COUNT(*)
FROM OPENJSON(@cars) WITH (year int, make nvarchar(50), model nvarchar(50));
-- 71ms

0f867-1n9s0lw6pnonrbxbcu5ljag

Looking at the execution plans, the OPENJSON function is still processing all ~20k rows in both queries, only the number of rows being brought back to the SSMS GUI differ.

This still isn't the same as what the C# test below does (all data in the C# example stays in memory at all times) but it is as close of a comparison that I could think of:

var deserializedCars = JsonConvert.DeserializeObject<IEnumerable<Car>>(carsJSON);
// 66ms

(Full C# code available at: https://gist.github.com/bertwagner/8e5e8b6ec977c1704355166f96ae3efa)

And the result of this test? SQL Server was nearly as fast as Json.NET!

736c4-1l8jgil5hl_acboawarlepq

Test #2 — Deserializing ~20k rows with a predicate

In this next test we filter and return only a subset of rows.

SQL:

SELECT count(*) FROM OPENJSON(@cars) WITH(model nvarchar(20) ) WHERE model = 'Golf'
// 58ms

C#

var queriedCars = JsonConvert.DeserializeObject<IEnumerable<Car>>(carsJSON).Where(x => x.Model == "Golf");
// 52ms

Result: SQL Server is nearly as fast once again!

08563-1mevkdr3q5nqtjumfnuqtdq

One more important thing to note about this specific test — if you add this data into a SQL table and add a computed column index, SQL Server will beat out Json.NET every time.

Test #3 — Serializing ~20 elements into JSON

This scenario is particularly difficult to test. If I want to serialize data in a SQL table to a JSON string, how do I write the equivalent of that in C#? Do I use a DataTable and hope that SQL's data is all in cache? Is the retrieval speed between the SQL Server buffer equivalent to C#'s DataTable? Would a collection of List's in C# be more appropriate than a DataTable?

In the end, I decided to force SQL to read pages from disk by clearing the cache and have C# read the object data from a flat file. This still isn't perfect, but it is as close as I think we can get:

SQL:

DBCC DROPCLEANBUFFERS
SELECT * FROM dbo.Cars FOR JSON AUTO
-- 108ms

C#:

string carsJSONFromFile = File.ReadAllText(@"../../CarData.json");
var serializedCars = JsonConvert.SerializeObject(deserializedCars);
// 63ms

This test still isn't perfect though because SSMS is outputting the JSON string to the screen while C# never has to. I didn't want to play around with outputting the C# version to a form or the console window because it still wouldn't have been an equal comparison.

Result: Json.Net is about twice as fast although this test is by far the most inaccurate. Still, SQL is still much faster than I initially thought.

9c286-19i6qzwiispgc1eirzhtjsa

SQL Server JSON vs. XML

In my previous article comparing SQL Server JSON to SQL Server XML, I focused on tests that were all done within SQL Server.

These tests were incomplete however: most of the time, a developer's app will have to do additional processing to get data into an XML format, while JSON data usually already exists in JSON format (assuming we have Javascript web app).

These two tests examine cases where XML may have been slightly faster than JSON on SQL Server, but if you consider the entire environment (app + database), using JSON wins.

Scenario #1 — XML data needs to be serialized

Although inserting XML data that is already in memory into a SQL Server table is faster than the equivalent operation in JSON, what happens if we need to serialize the data in our app first before sending the data to SQL Server?

// Serialize Car objects to XML
var result1 = SerializeToXML(cars);
// 166ms

// Serialize Car objects to JSON
var result2 = SerializeToJSON(cars);
// 69ms

public static Tuple<long, string> SerializeToXML(List<Car> cars)
{
  Stopwatch sw = new Stopwatch();
  sw.Start();
  StringWriter writer = new StringWriter();
  XmlSerializer serializer = new XmlSerializer(typeof(List<Car>));
  serializer.Serialize(writer, cars);
  string result = writer.ToString();
  sw.Stop();
  return new Tuple<long, string>(sw.ElapsedMilliseconds, result);
}

public static Tuple<long, string> SerializeToJSON(List<Car> cars)
{
  Stopwatch sw = new Stopwatch();
  sw.Start();
  var json = JsonConvert.SerializeObject(cars);
  sw.Stop();
  return new Tuple<long, string>(sw.ElapsedMilliseconds, json);
}

Using the most common libraries available to serializing data to XML and JSON, serializing data to JSON is twice as fast as serializing to XML (and as mentioned before, a lot of the time apps already have JSON data available — no need to serialize). This means the app serialization code will run faster and allow for the data to make it to SQL Server faster.

Scenario #5 — Transferring XML and JSON to SQL Server

Finally, after we have our serialized XML and JSON data in C#, how long does it take to transfer that data to SQL Server?

// Write XML string to SQL XML column
var result3 = WriteStringToSQL(
  result1.Item2, 
  "INSERT INTO dbo.XmlVsJson (XmlData) VALUES (@carsXML)", 
  new SqlParameter[]
  {
    new SqlParameter("carsXML", result1.Item2)
  });
// 142ms, 1.88mb of data

// Write JSON string to SQL
var result4 = WriteStringToSQL(
  carsJSON,
  "INSERT INTO dbo.XmlVsJson (JsonData) VALUES (@carsJSON)",
  new SqlParameter[]
  {
    new SqlParameter("carsJSON", carsJSON)
  });
// 20ms, 1.45mb of data

// Write XML string to nvarchar SQL column.  Taking the difference between this and result3, 100ms+ of time is spent converting to XML format on insert.
var result5 = WriteStringToSQL(
  result1.Item2, 
  "INSERT INTO dbo.XmlVsJson (JSONData) VALUES (@carsXML)",
  new SqlParameter[]
  {
    new SqlParameter("carsXML", result1.Item2)
  });
// 29ms, 1.88mb of data

Result: Writing JSON data to a nvarchar SQL Server column is much faster than writing XML data to an XML typed (or even an nvarchar typed) column.

Not only does SQL server need to parse the XML data upon insert, the physical size of the XML data being sent over TCP is larger due to the repetitive nature of XML syntax.

Conclusion

JSON performance in SQL Server is still awesome. In fact, it's even better than I had previously thought.

These tests are not meant to be conclusive; think of them more as errata for my previous JSON performance posts.

However, I think that these comparisons show that SQL Server's JSON functions are competitive with other languages' performance of handling JSON data.

Additionally, if serializing/deserializing reduces the amount of data transferred over TCP, using the JSON functions in SQL Server will most likely give you better total app/environment performance.

XML vs JSON Shootout: Which is Superior in SQL Server 2016?

"A duel is a duel" by Emanuele Rosso is licensed under CC BY-NC-ND 2.0

Watch this week's video on YouTube

Additional performance comparisons available in an updated post.

Starting with the 2016 release, SQL Server offers native JSON support. Although the implementation is not perfect, I am still a huge fan.

Even if a new feature like JSON support is awesome, I am only likely to use it if it is practical and performs better than the alternatives.

Today I want to pit JSON against XML and see which is the better format to use in SQL Server.

Enter XML, SQL's Bad Hombre

Full disclosure: I don't love XML and I also don't love SQL Server's implementation of it.

XML is too wordy (lots of characters wasted on closing tags), it has elements AND attributes (I don't like having to program for two different scenarios), and depending on what language you are programming in, sometimes you need schema files and sometimes you don't.

SQL Server's implementation of XML does have some nice features like a dedicated datatype that reduces storage space and validates syntax, but I find the querying of XML to be clumsy.

All XML grievances aside, I am still willing to use XML if it outperforms JSON. So let's run some test queries!

Is JSON SQL Server's New Sheriff in Town?

Although performance is the final decider in these comparison tests, I think JSON has a head start over XML purely in terms of usability. SQL Server's JSON function signatures are easier to remember and cleaner to write on screen.

The test data I'm using is vehicle year/make/model data from https://github.com/arthurkao/vehicle-make-model-data. Here's what it looks like once I loaded it into a table called dbo.XmlVsJson:

CREATE TABLE dbo.XmlVsJson
(
  Id INT IDENTITY PRIMARY KEY,
  XmlData XML,
  JsonData NVARCHAR(MAX)
)

(The full data query is available in this gist if you want to play along at home)

Data Size

So XML should be larger right? It's got all of those repetitive closing tags?

SELECT
  DATALENGTH(XmlData)/1024.0/1024.0 AS XmlMB,
  DATALENGTH(JsonData)/1024.0/1024.0 AS JsonMB
FROM
  dbo.XmlVsJson

82246-1ra2xqdbn4movjivmlhtnxa

Turns out the XML is actually smaller! How can this be? This is the magic behind the SQL Server XML datatype. SQL doesn't store XML as a giant string; it stores only the XML InfoSet, leading to a reduction in space.

The JSON on the other hand is stored as regular old nvarchar(max) so its full string contents are written to disk. XML wins in this case.

0745a-1liaytu4vkxadqugdbn1j3g

INSERT Performance

So XML is physically storing less data when using the XML data type than JSON in the nvarchar(max) data type, does that mean it will insert faster as well? Here's our query that tries to insert 100 duplicates of the row from our first query:

SET STATISTICS TIME ON

INSERT INTO dbo.XmlVsJson (XmlData)
SELECT XmlData FROM dbo.XmlVsJson 
  CROSS APPLY 
  (
    SELECT DISTINCT number 
    FROM master..spt_values 
    WHERE number BETWEEN 1 AND 100
  )t WHERE Id = 1
GO

INSERT INTO dbo.XmlVsJson (JsonData)
SELECT JsonData FROM dbo.XmlVsJson 
  CROSS APPLY 
  (
    SELECT DISTINCT number 
    FROM master..spt_values 
    WHERE number BETWEEN 1 AND 100
  )t WHERE Id = 1
GO

And the results? Inserting the 100 XML rows took 613ms on my machine, while inserting the 100 JSON rows took 1305ms…XML wins again!

JSON ain't looking too hot. Wait for it…

I'm guessing since the XML data type physically stores less data, it makes sense that it would also write it out to the table faster as well.

CRUD Operations

I'm incredibly impressed by SQL Server's JSON performance when compared to .NET — but how does it compare to XML on SQL Server?

Read

Let's select the fragment for our second car from our XML and JSON:

SELECT t.XmlData.query('/cars/car[2]') 
FROM dbo.XmlVsJson t 
WHERE Id = 1

SELECT JSON_QUERY(t.JsonData, '$.cars[1]') 
FROM dbo.XmlVsJson t 
WHERE Id = 1

91b35-1xixep1a6ebvvn38_vvpavq

Result? JSON wins (at 0ms vs 63ms for XML) when needing to pluck out a fragment from our larger object string.

8a02b-1hzaxjaobi3dxqurwrelf4w

What if we want to grab a specific value instead of a fragment?

SELECT t.XmlData.value('(/cars/car[2]/model)[1]', 'varchar(100)') FROM dbo.XmlVsJson t 
WHERE Id = 1

SELECT JSON_VALUE(t.JsonData, '$.cars[1].model') 
FROM dbo.XmlVsJson t 
WHERE Id = 1

Once again JSON wins with 0ms vs 11ms for XML.

41179-1esau9dtm4uezlxgarzxb1q

If you look at the execution plans for these last two queries, it's easy to see that XML has a lot more to do behind the scenes to retrieve the data:

XML:

327b8-1fgaieclnioqa5-zytrrndq

JSON:

45848-1s9aop29_eiltkvsnszkw9g

Create

We saw above that inserting rows of XML data is faster than inserting rows of JSON, but what if we want to insert new data into the object strings themselves? Here I want to insert the property "mileage" into the first car object:

db0c0-1btrkakuspyaj8iu5lq7wsg

UPDATE t SET XmlData.modify('
insert <mileage>100,000</mileage>
into (/cars/car[1])[1]') 
FROM dbo.XmlVsJson t 
WHERE Id = 1

UPDATE t SET JsonData = JSON_MODIFY(JsonData,
'$.cars[0].mileage','100,000') 
FROM dbo.XmlVsJson t 
WHERE Id = 1

In addition to the cleaner syntax (JSON_MODIFY() is essentially the same as a REPLACE()) the JSON insert runs in 22ms compared to the 206ms for XML. Another JSON win.

c01d2-1kng0gj5jdkk5ujd1qlg5ig

Update

Let's update the mileage properties we just added to have values of 110,000:

UPDATE t SET XmlData.modify('
replace value of (/cars/car[1]/mileage/text())[1]
with     "110,000"') 
FROM dbo.XmlVsJson t
WHERE Id = 1

UPDATE t SET JsonData = JSON_MODIFY(JsonData, '$.cars[0].mileage','110,000') 
FROM dbo.XmlVsJson t
WHERE Id = 1

72c82-1hpv3kbbgv88tzuyozvineg

Result? JSON has the quicker draw and was able to perform this update in 54ms vs XML's 194ms.

c81d3-1fvual_pu3eyahvumnfimqg

Delete

Deleting large string data, a DBA's dream *snicker*.

Let's delete the mileage property, undoing all of that hard work we just did:

UPDATE t SET XmlData.modify('
delete /cars/car[1]/mileage[1]') 
FROM dbo.XmlVsJson t 
WHERE Id = 1

UPDATE t SET JsonData = JSON_MODIFY(JsonData, '$.cars[0].mileage', null) 
FROM dbo.XmlVsJson t 
WHERE Id = 1

JSON doesn't take any time to reload and wins against XML again 50ms to 159ms.

538ba-1ojms_3yjdktnni2a3pqcrg

Read Part 2: Indexes

So above we saw that JSON was faster than XML at reading fragments and properties from a single row of serialized data. But our SQL Server's probably have LOTS of rows of data — how well does indexed data parsing do in our match up?

First let's expand our data — instead of storing all of our car objects in a single field, let's build a new table that has each car on its own row:

(once again, full dataset at GitHub if you are playing along at home)

Now that we have our expanded data in our table, let's add some indexes. The XML datatype in SQL Server has its own types of indexes, while JSON simply needs a computed column with a regular index applied to it.

DROP INDEX IF EXISTS PXML_XmlData ON XmlVsJson2
CREATE PRIMARY XML INDEX PXML_XmlData
ON XmlVsJson2 (XmlData);

ALTER TABLE dbo.XmlVsJson2
ADD MakeComputed AS JSON_VALUE(JsonData, '$.make')
CREATE NONCLUSTERED INDEX IX_JsonData ON dbo.XmlVsJson2 (MakeComputed)

(Note: I also tried adding an XML secondary index for even better performance, but I couldn't get the query engine to use that secondary index on such a basic dataset)

If we try to find all rows that match a predicate:

SELECT Id, XmlData 
FROM dbo.XmlVsJson2 t 
WHERE t.XmlData.exist('/car/make[.="ACURA"]') = 1

SELECT Id, JsonData 
FROM dbo.XmlVsJson2 t 
WHERE JSON_VALUE(t.JsonData, '$.make') = 'ACURA'

XML is able to filter out 96 rows in 200ms and JSON accomplishes the same in 9ms. A final win for JSON.

327d9-1-jtwonmiccgvf25ksnmb3q

Conclusion

If you need to store and manipulate serialized string data in SQL Server, there's no question: JSON is the format of choice. Although JSON's storage size is a little larger than its XML predecessor, SQL Server's JSON functions outperform XML in speed in nearly all cases.

Is there enough performance difference to rewrite all of your old XML code to JSON? Probably not, but every case is different.

One thing is clear: new development should consider taking advantage of SQL Server's new JSON functions.