Data with Bert logo

Searching Complex JSON Data

Watch this week's video on YouTube

Computed column indexes make querying JSON data fast and efficient, especially when the schema of the JSON data is the same throughout a table.

It's also possible to break out a well-known complex JSON structure into multiple SQL Server tables.

However, what happens if you have different JSON structures being stored in each row of your database and you want to write efficient search queries against all of the rows of your complex JSON strings?

Complex JSON

Let's start out by creating a staging table that contains various fragments of JSON stored in a nvarchar column:

DROP TABLE IF EXISTS dbo.ImportedJson;
GO
CREATE TABLE dbo.ImportedJson
(
    Id int IDENTITY,
    JsonValue nvarchar(max)
);
GO

INSERT INTO dbo.ImportedJson (JsonValue) VALUES (N'{ 
    "Property1" : "Value1", 
    "Property2" : [1,2,3]
}');

INSERT INTO dbo.ImportedJson (JsonValue) VALUES (N'{ 
    "Property1" : "Value2", 
    "Property3" : [1,2,3], 
    "Property4" : ["A","B","C",null], 
    "Property5" : { 
                    "SubProp1": "A", 
                    "SubProp2": { 
                                    "SubSubProp1":"B", 
                                    "SubSubProp2": 1.2,
                                    "SubSubProp3" : true
                                } 
                    }, 
    "Property6" : [{"ArrayProp":"A"},{"ArrayProp":"B"}], 
    "Property7" : 123, 
    "Property8" : null 
}');

INSERT INTO dbo.ImportedJson (JsonValue) VALUES (N'{ 
    "Property8" : "Not null", 
    "Property9" : [4,5,6]
}');


SELECT * FROM dbo.ImportedJSON;

And the results: 

2019-02-21-06-54-22

Search Queries

If I want to search these values I have a few options.

First, I could write something like:

SELECT * FROM dbo.ImportedJSON WHERE JsonValue LIKE '%Property4" : "["A%';

But that technique is difficult to use on data that I'm not familiar with, and it will run slowly because it won't be able to seek to the data in any indexes.

A second option is to create something like a full text index, but unlike full text indexes on XML columns, I will have to fight with all of the quotes and colons and curly braces since there is no support for JSON. Yuck.

Option 3: Search Table

Option 3 is my favorite: normalize the data into a key and value columns that are easy to search:

WITH JSONRoot AS ( 
    SELECT 
        Id as RowId,
        CAST(hierarchyid::GetRoot().ToString() + CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS NVARCHAR(4000)) + '/' AS NVARCHAR(4000)) as [HierarchyId], 
        [key],
        [value],
        CAST([type] AS INT) AS [type] 
    FROM 
        dbo.ImportedJson
        CROSS APPLY OPENJSON(JsonValue,'$') 
    UNION ALL 
    SELECT 
        RowId,
        CAST(JSONRoot.[HierarchyId] + CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS NVARCHAR(4000)) + '/' AS NVARCHAR(4000)), 
        CASE WHEN JSONRoot.[type] = 4 THEN JSONRoot.[key]+'['+t.[key]+']' ELSE t.[key] END,
        t.[value],
        CAST(t.[type] AS INT) 
    FROM 
        JSONRoot 
        CROSS APPLY OPENJSON(JSONRoot.[value],'$') t 
    WHERE 
        JSONRoot.[type] > 3 /* Only parse complex data types */
) 
SELECT 
    RowId,
    CAST([HierarchyId] AS HierarchyId) AS [HierarchyId],
    [key],
    [value],
    [type]
FROM 
    JSONRoot 
ORDER BY 
    RowId,
    [HierarchyId]
GO

Results:

2019-02-21-11-12-41

This query parses each property of the original JSON input so that each key-value pair gets put on its row. Complex JSON objects are broken out into multiple rows, and a HierarchyId is included to maintain parent-child relationships if needed.

Having all of this complex JSON parsed out into a key value table now opens up possibilities of what we can do with it.

Process and Indexing

The above query isn't going to run itself. You'll either need to schedule it or incorporate it into an ETL to parse out your staged JSON data on a regular basis (kind of like full text indexing works asyncronously).

Alternatively you can write the logic into a trigger that fires on new row inserts into your staging table if you need this data in real-time. As with all triggers though, I wouldn't recommend this if your staging table is getting rows added at a high rate.

Once you decide how to store your parsed JSON data, add some indexes that will help your search queries run nice and fast (CREATE NONCLUSTERED INDEX IX_Value_Include ON dbo.ParsedJSON ([value]) INCLUDE ([key],RowId) would probably be a good starting point for many search queries) and you'll be doing a lot better than WHERE JsonValue LIKE '%Property4%'.

Extracting JSON Values Longer Than 4000 Characters

Watch this week's video on YouTube

A while back I built an automated process that parses JSON strings into a relational format.

Up until recently this process had been working great: my output table had all of the data I was expecting, neatly parsed into the correct rows and columns.

Last week I noticed an error in the output table however.  One row that was supposed to have a nicely parsed JSON value for a particular column had an ugly NULL instead.

Truncated?

First I checked my source JSON string - it had the "FiveThousandAs" property I was looking for:

DECLARE @json nvarchar(max) = N'{
    "Id" : 1,
    "FiveThousandAs" : "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
    "FourAs" : "aaaa"
}';

So the source data was fine.

I checked the table column I was inserting into as well and confirmed it was defined as nvarchar(max), so no problem there.

The last thing I checked was the query I was using:

SELECT JSON_VALUE(@json, '$.FiveThousandAs')

If I run that on it's own, I reproduce the NULL I was seeing inserted into my table:

2018-09-15_18-26-27

JSON_VALUE is limiting

After a little bit more research, I discovered that the return type for JSON_VALUE is limited to 4000 characters.   Since JSON_VALUE is in lax mode by default, if the output has more than 4000 characters, it fails silently.

To force an error in future code I could use SELECT JSON_VALUE(@json, 'strict \$.FiveThousandAs')  so at least I would be notified immediately of an problem with my  query/data (via failure).

Although strict mode will notify me of issues sooner, it still doesn't help me extract all of the data from my JSON property.

(Side note: I couldn't define my nvarchar(max) column as NOT NULL because for some rows the value could be NULL, but in the future I might consider adding additional database validation with a check constraint).

OPENJSON

The solution to reading the entire 5000 character value from my JSON property is to use OPENJSON:

SELECT * 
FROM OPENJSON(@json) 
WITH (FiveThousandAs nvarchar(max) '$.FiveThousandAs')

My insert query needed to be slightly refactored, but now I'm able to return any length value (as long as it's under 2gb).

In hindsight, I should have used OPENJSON() from the start: not only is it capable of parsing the full length values from JSON strings, but it performs significantly faster than any of the other SQL Server JSON functions.

As a best practice, I think I'm going to use OPENJSON by default for any JSON queries to avoid problems like this in the future.