Extracting JSON Values Longer Than 4000 Characters

Published on: 2018-09-18

A while back I built an automated process that parses JSON strings into a relational format.

Up until recently this process had been working great: my output table had all of the data I was expecting, neatly parsed into the correct rows and columns.

Last week I noticed an error in the output table however.  One row that was supposed to have a nicely parsed JSON value for a particular column had an ugly NULL instead.

Truncated?

First I checked my source JSON string – it had the “FiveThousandAs” property I was looking for:

So the source data was fine.

I checked the table column I was inserting into as well and confirmed it was defined as nvarchar(max), so no problem there.

The last thing I checked was the query I was using:

If I run that on it’s own, I reproduce the NULL I was seeing inserted into my table:

JSON_VALUE is limiting

After a little bit more research, I discovered that the return type for JSON_VALUE is limited to 4000 characters.   Since JSON_VALUE is in lax mode by default, if the output has more than 4000 characters, it fails silently.

To force an error in future code I could use SELECT JSON_VALUE(@json, 'strict $.FiveThousandAs')  so at least I would be notified immediately of an problem with my  query/data (via failure).

Although strict mode will notify me of issues sooner, it still doesn’t help me extract all of the data from my JSON property.

(Side note: I couldn’t define my nvarchar(max) column as NOT NULL because for some rows the value could be NULL, but in the future I might consider adding additional database validation with a check constraint).

OPENJSON

The solution to reading the entire 5000 character value from my JSON property is to use OPENJSON:

My insert query needed to be slightly refactored, but now I’m able to return any length value (as long as it’s under 2gb).

In hindsight, I should have used OPENJSON() from the start: not only is it capable of parsing the full length values from JSON strings, but it performs significantly faster than any of the other SQL Server JSON functions.

As a best practice, I think I’m going to use OPENJSON by default for any JSON queries to avoid problems like this in the future.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

6 Techniques For Troubleshooting Your Code

Published on: 2018-08-14

This post is a response to this month’s T-SQL Tuesday #105 prompt by Wayne Sheffield.  T-SQL Tuesday is a way for the SQL Server community to share ideas about different database and professional topics every month.

This month’s topic asks to share a time you ran into a metaphorical brick wall and how you worked it out.


Watch this week’s education film on YouTube.

The Problem: Trimmed JSON Values

Recently I was using FOR JSON PATH to generate a JSON string from a query to use in a web app:

The resulting JSON string is 5,580 characters long.

The goal was to read this query result into my .NET app like so:

Is this the best way to design every app/database interaction?  Probably not, but it was right for this scenario because I didn’t want to create models for all of the dynamic data I’d be returning.

Design decisions aside, my “brick wall” issue was that my “jsonResult” value was getting truncated around 2,000 characters instead of displaying the full 5,580.  The JSON string looked great in SSMS, but for the life of me I couldn’t figure out why the data was getting chopped off when read into .NET.

Time to Debug

What follows are the usual steps I take when debugging a problem that has me stumped.  I’ve turned these into a 1950s style educational film so you can laugh at my bad jokes while your learn.

1. Rubber Duck Debugging

The first thing I usually do when I hit a wall like this is talk myself through the problem again.

This technique usually works well for me and is equivalent to those times when you ask  someone for help but realize the solution while explaining the problem to them.

To save yourself embarrassment (and to let your coworkers keep working uninterrupted), people often substitute an inanimate object, like a rubber duck, instead of a coworker to try and work out the problem on their own.

Alas, in this case explaining the problem to myself didn’t help, so I moved on to the next technique.

2. Simplify the Problem

Breaking a problem down into smaller solvable parts can help sometimes.  I changed my query to return less data by switching to SELECT TOP 5 and seeing if .NET was still truncating the data.  It wasn’t! Mildly successful!

In this case though, I couldn’t really build off my simplified example.  As soon as my result passed ~2,000 characters, the JSON string was getting chopped off.

In this step I also figured out if I put my query into a derived table, my .NET code worked beautifully and returned the complete JSON string:

This was an ugly solution though and I didn’t like doing it.  I especially didn’t like it because I didn’t know why a derived table fixed the output.

3. Check the Internet

As great as the internet is, I try to limit myself to how much searching I do on it when troubleshooting.  Searching for an online solution can quickly devolve into wasting two hours with nothing to show.

I performed some cursory searches on Google, StackOverflow, various forums, blogs, etc… but didn’t find anything helpful (fun/sad fact: I searched for solutions again while typing up this post and now find plenty of solutions…who knows what I was doing wrong that day).

4. Ask a Friend

I love working through problems with others because I’m often amazed at how differently others approach a problem.  Often times this leads to a solution I would not have thought of on my own.  I especially enjoy hearing from people new to the subject area because they often have the most creative solutions due to not yet having become cynical and jaded from previous experience :).

I try to hold off on this option until at least trying all of the above techniques because 1) I hate breaking another person’s concentration 2) I feel like I learn better if I struggle through a problem myself.

And in this case shopping the problem around didn’t help – no one I talked to had a great solution.

5. Take a Break

After trying all of the above, I was out of ideas.  I took a break from the problem for the rest of the day, resolved to give it another try in the morning the morning.

6. RTFM

And the next morning, I had the idea to check the documentation to see what it said about the return type of FOR JSON PATH.

Embarrassingly, this one should be way higher on the list, and I’d like to say that it usually is, but for one reason or another I didn’t bother checking until this late in the game.

And wouldn’t you know it?  The last paragraph, of the last section, tells me exactly what I needed to know.

The documentation tells me that the JSON string will be broken up across multiple rows and my client app needs to concatenate them all together.  What I ended up doing is a simple String.Join():

There’s no explanation for why SSMS is able to concatenate these values together but other client apps have to manually do so, but at least I found my documented solution.

Conclusion

Even though I found a somewhat-satisfactory solution in the documentation, my fall back was going to be to use the ugly derived table solution discovered in step 2.  It was ugly, but at some point I would have to call it quits and settle with an ugly workaround rather than spend more time on troubleshooting.

Next time I’ll be sure to check the documentation earlier in the process and hopefully that will save me from some of the frustration I encountered in this particular scenario.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Converting JSON to SQL Server CREATE TABLE Statements

Published on: 2018-05-22

Watch this week’s episode on YouTube.

Tedious, repetitive tasks are the bane of any lazy programmer.  I know, because I am one.

One such repetitive task that I find comparable to counting grains of rice is building database layouts from JSON data sources.

While some online services exist that will parse JSON objects into database structures, I don’t like using them because I don’t trust the people running those sites with my data.  Nothing personal against them, I just don’t want to be passing my data through their servers.

My solution to this problem was to write a query that will parse my unfamiliar JSON documents into a series of CREATE TABLE statements.

Automatically Generating A SQL Database Schema From JSON

You can always get the most recent version of the query from GitHub, but I’ll post the current version below so that it’s easier to explain in this post:

In the variables section, we can define our input JSON document string as well as define things like a root table name and default database schema name.

There is also a string padding variable.  This padding variable’s value is added to the max value length found in each column being generated, giving each column a little bit more breathing room.

Next in the script is the recursive CTE that parses the JSON string.  The OPENJSON() function in SQL Server makes this part relatively easy since some of the work of determining datatypes is already done for you.

I’ve taken the liberty to convert all strings to nvarchar types, numbers to either floats or ints, booleans to bits, and datetime strings to datetime2s.

Two additional CTE expressions add an integer IDENTITY PRIMARY KEY column to each table as well as a column referencing the parent table if applicable (our foreign key column).

Finally, a little bit of dynamic SQL pieces together all of these components to generate our CREATE TABLE scripts.

Limitations

I created this code with a lot of assumptions about my (unfamiliar) JSON data sets.  For the purpose of roughly building out tables from large JSON files, I don’t need the results to be perfect and production-ready; I just want the results to be mostly correct so the vast majority of tedious table creation work is automated.

With that disclaimer made, here are a few things to be aware of:

  • Sometimes there will be duplicate column names generated because of naming – just delete one.
  • While foreign key columns exist, the foreign key constraints don’t.
  • This code uses STRING_AGG.  I’ll leave it up to you to convert to STUFF and FOR XML PATH if you need to run it in versions prior to 2017.

Summary

This script is far from perfect.  But it has eliminated the need for me to build out these tables and columns from scratch.  Sure, the output sometimes needs a tweak or too, but for my purposes I’m happy with how it turned out.  I hope it helps you eliminate some boring table creation work too.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!