Searching Complex JSON Data

Published on: 2019-02-26

Watch this week’s video on YouTube

Computed column indexes make querying JSON data fast and efficient, especially when the schema of the JSON data is the same throughout a table.

It’s also possible to break out a well-known complex JSON structure into multiple SQL Server tables.

However, what happens if you have different JSON structures being stored in each row of your database and you want to write efficient search queries against all of the rows of your complex JSON strings?

Complex JSON

Let’s start out by creating a staging table that contains various fragments of JSON stored in a nvarchar column:

DROP TABLE IF EXISTS dbo.ImportedJson;
GO
CREATE TABLE dbo.ImportedJson
(
    Id int IDENTITY,
    JsonValue nvarchar(max)
);
GO

INSERT INTO dbo.ImportedJson (JsonValue) VALUES (N'{ 
    "Property1" : "Value1", 
    "Property2" : [1,2,3]
}');

INSERT INTO dbo.ImportedJson (JsonValue) VALUES (N'{ 
    "Property1" : "Value2", 
    "Property3" : [1,2,3], 
    "Property4" : ["A","B","C",null], 
    "Property5" : { 
                    "SubProp1": "A", 
                    "SubProp2": { 
                                    "SubSubProp1":"B", 
                                    "SubSubProp2": 1.2,
                                    "SubSubProp3" : true
                                } 
                    }, 
    "Property6" : [{"ArrayProp":"A"},{"ArrayProp":"B"}], 
    "Property7" : 123, 
    "Property8" : null 
}');

INSERT INTO dbo.ImportedJson (JsonValue) VALUES (N'{ 
    "Property8" : "Not null", 
    "Property9" : [4,5,6]
}');


SELECT * FROM dbo.ImportedJSON;

And the results: 

Search Queries

If I want to search these values I have a few options.

First, I could write something like:

SELECT * FROM dbo.ImportedJSON WHERE JsonValue LIKE '%Property4" : "["A%';

But that technique is difficult to use on data that I’m not familiar with, and it will run slowly because it won’t be able to seek to the data in any indexes.

A second option is to create something like a full text index, but unlike full text indexes on XML columns, I will have to fight with all of the quotes and colons and curly braces since there is no support for JSON. Yuck.

Option 3: Search Table

Option 3 is my favorite: normalize the data into a key and value columns that are easy to search:

WITH JSONRoot AS ( 
    SELECT 
        Id as RowId,
        CAST(hierarchyid::GetRoot().ToString() + CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS NVARCHAR(4000)) + '/' AS NVARCHAR(4000)) as [HierarchyId], 
        [key],
        [value],
        CAST([type] AS INT) AS [type] 
    FROM 
        dbo.ImportedJson
        CROSS APPLY OPENJSON(JsonValue,'$') 
    UNION ALL 
    SELECT 
        RowId,
        CAST(JSONRoot.[HierarchyId] + CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS NVARCHAR(4000)) + '/' AS NVARCHAR(4000)), 
        CASE WHEN JSONRoot.[type] = 4 THEN JSONRoot.[key]+'['+t.[key]+']' ELSE t.[key] END,
        t.[value],
        CAST(t.[type] AS INT) 
    FROM 
        JSONRoot 
        CROSS APPLY OPENJSON(JSONRoot.[value],'$') t 
    WHERE 
        JSONRoot.[type] > 3 /* Only parse complex data types */

SELECT 
    RowId,
    CAST([HierarchyId] AS HierarchyId) AS [HierarchyId],
    [key],
    [value],
    [type]
FROM 
    JSONRoot 
ORDER BY 
    RowId,
    [HierarchyId]
GO

Results:

This query parses each property of the original JSON input so that each key-value pair gets put on its row. Complex JSON objects are broken out into multiple rows, and a HierarchyId is included to maintain parent-child relationships if needed.

Having all of this complex JSON parsed out into a key value table now opens up possibilities of what we can do with it.

Process and Indexing

The above query isn’t going to run itself. You’ll either need to schedule it or incorporate it into an ETL to parse out your staged JSON data on a regular basis (kind of like full text indexing works asyncronously).

Alternatively you can write the logic into a trigger that fires on new row inserts into your staging table if you need this data in real-time. As with all triggers though, I wouldn’t recommend this if your staging table is getting rows added at a high rate.

Once you decide how to store your parsed JSON data, add some indexes that will help your search queries run nice and fast (CREATE NONCLUSTERED INDEX IX_Value_Include ON dbo.ParsedJSON ([value]) INCLUDE ([key],RowId) would probably be a good starting point for many search queries) and you’ll be doing a lot better than WHERE JsonValue LIKE '%Property4%'.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Extracting JSON Values Longer Than 4000 Characters

Published on: 2018-09-18

A while back I built an automated process that parses JSON strings into a relational format.

Up until recently this process had been working great: my output table had all of the data I was expecting, neatly parsed into the correct rows and columns.

Last week I noticed an error in the output table however.  One row that was supposed to have a nicely parsed JSON value for a particular column had an ugly NULL instead.

Truncated?

First I checked my source JSON string – it had the “FiveThousandAs” property I was looking for:

DECLARE @json nvarchar(max) = N'{
    "Id" : 1,
    "FiveThousandAs" : "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
    "FourAs" : "aaaa"
}';

So the source data was fine.

I checked the table column I was inserting into as well and confirmed it was defined as nvarchar(max), so no problem there.

The last thing I checked was the query I was using:

SELECT JSON_VALUE(@json, '$.FiveThousandAs')

If I run that on it’s own, I reproduce the NULL I was seeing inserted into my table:

JSON_VALUE is limiting

After a little bit more research, I discovered that the return type for JSON_VALUE is limited to 4000 characters.   Since JSON_VALUE is in lax mode by default, if the output has more than 4000 characters, it fails silently.

To force an error in future code I could use SELECT JSON_VALUE(@json, ‘strict $.FiveThousandAs’)  so at least I would be notified immediately of an problem with my  query/data (via failure).

Although strict mode will notify me of issues sooner, it still doesn’t help me extract all of the data from my JSON property.

(Side note: I couldn’t define my nvarchar(max) column as NOT NULL because for some rows the value could be NULL, but in the future I might consider adding additional database validation with a check constraint).

OPENJSON

The solution to reading the entire 5000 character value from my JSON property is to use OPENJSON:

SELECT * 
FROM OPENJSON(@json) 
WITH (FiveThousandAs nvarchar(max) '$.FiveThousandAs')

My insert query needed to be slightly refactored, but now I’m able to return any length value (as long as it’s under 2gb).

In hindsight, I should have used OPENJSON() from the start: not only is it capable of parsing the full length values from JSON strings, but it performs significantly faster than any of the other SQL Server JSON functions.

As a best practice, I think I’m going to use OPENJSON by default for any JSON queries to avoid problems like this in the future.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!


6 Techniques For Troubleshooting Your Code

Published on: 2018-08-14

This post is a response to this month’s T-SQL Tuesday #105 prompt by Wayne Sheffield.  T-SQL Tuesday is a way for the SQL Server community to share ideas about different database and professional topics every month.

This month’s topic asks to share a time you ran into a metaphorical brick wall and how you worked it out.


Watch this week’s education film on YouTube.

The Problem: Trimmed JSON Values

Recently I was using FOR JSON PATH to generate a JSON string from a query to use in a web app:

SELECT TOP 100 *
FROM master..spt_values
WHERE type='P'
FOR JSON PATH

The resulting JSON string is 5,580 characters long.

The goal was to read this query result into my .NET app like so:

var jsonResult = db.Database.SqlQuery<string>("SELECT TOP 100 * FROM ... FOR JSON PATH");

Is this the best way to design every app/database interaction?  Probably not, but it was right for this scenario because I didn’t want to create models for all of the dynamic data I’d be returning.

Design decisions aside, my “brick wall” issue was that my “jsonResult” value was getting truncated around 2,000 characters instead of displaying the full 5,580.  The JSON string looked great in SSMS, but for the life of me I couldn’t figure out why the data was getting chopped off when read into .NET.

Time to Debug

What follows are the usual steps I take when debugging a problem that has me stumped.  I’ve turned these into a 1950s style educational film so you can laugh at my bad jokes while your learn.

1. Rubber Duck Debugging

The first thing I usually do when I hit a wall like this is talk myself through the problem again.

This technique usually works well for me and is equivalent to those times when you ask  someone for help but realize the solution while explaining the problem to them.

To save yourself embarrassment (and to let your coworkers keep working uninterrupted), people often substitute an inanimate object, like a rubber duck, instead of a coworker to try and work out the problem on their own.

Alas, in this case explaining the problem to myself didn’t help, so I moved on to the next technique.

2. Simplify the Problem

Breaking a problem down into smaller solvable parts can help sometimes.  I changed my query to return less data by switching to SELECT TOP 5 and seeing if .NET was still truncating the data.  It wasn’t! Mildly successful!

In this case though, I couldn’t really build off my simplified example.  As soon as my result passed ~2,000 characters, the JSON string was getting chopped off.

In this step I also figured out if I put my query into a derived table, my .NET code worked beautifully and returned the complete JSON string:

SELECT * FROM
(
SELECT TOP 100 *
FROM master..spt_values
WHERE type='P'
FOR JSON PATH
) t(c)

This was an ugly solution though and I didn’t like doing it.  I especially didn’t like it because I didn’t know why a derived table fixed the output.

3. Check the Internet

As great as the internet is, I try to limit myself to how much searching I do on it when troubleshooting.  Searching for an online solution can quickly devolve into wasting two hours with nothing to show.

I performed some cursory searches on Google, StackOverflow, various forums, blogs, etc… but didn’t find anything helpful (fun/sad fact: I searched for solutions again while typing up this post and now find plenty of solutions…who knows what I was doing wrong that day).

4. Ask a Friend

I love working through problems with others because I’m often amazed at how differently others approach a problem.  Often times this leads to a solution I would not have thought of on my own.  I especially enjoy hearing from people new to the subject area because they often have the most creative solutions due to not yet having become cynical and jaded from previous experience :).

I try to hold off on this option until at least trying all of the above techniques because 1) I hate breaking another person’s concentration 2) I feel like I learn better if I struggle through a problem myself.

And in this case shopping the problem around didn’t help – no one I talked to had a great solution.

5. Take a Break

After trying all of the above, I was out of ideas.  I took a break from the problem for the rest of the day, resolved to give it another try in the morning the morning.

6. RTFM

And the next morning, I had the idea to check the documentation to see what it said about the return type of FOR JSON PATH.

Embarrassingly, this one should be way higher on the list, and I’d like to say that it usually is, but for one reason or another I didn’t bother checking until this late in the game.

And wouldn’t you know it?  The last paragraph, of the last section, tells me exactly what I needed to know.

The documentation tells me that the JSON string will be broken up across multiple rows and my client app needs to concatenate them all together.  What I ended up doing is a simple String.Join():

var jsonResult = String.Join("",db.Database.SqlQuery<string>("SELECT TOP 100 * FROM ... FOR JSON PATH"));

There’s no explanation for why SSMS is able to concatenate these values together but other client apps have to manually do so, but at least I found my documented solution.

Conclusion

Even though I found a somewhat-satisfactory solution in the documentation, my fall back was going to be to use the ugly derived table solution discovered in step 2.  It was ugly, but at some point I would have to call it quits and settle with an ugly workaround rather than spend more time on troubleshooting.

Next time I’ll be sure to check the documentation earlier in the process and hopefully that will save me from some of the frustration I encountered in this particular scenario.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!