Shortchanged with International Money in SQL Server

Published on: 2018-09-25

Imagine you have to perform some salary analysis for your employer International Mega Corp.

The data you have to work with looks something like this:

Why are the salaries stored as nvarchar and formatted with commas, spaces, and periods?

Great question!  Someone wanted to make sure these amounts would look good in the UI so storing the formatted values in the database would be the way to go…

Pretty for the UI, not really great for needing to do analysis on.

Thanks to Zanoni Labuschagne, one of the subscribers to my YouTube channel, for recommending this topic!

CONVERT!

I’m a firm believer that money values should always be stored in the decimal datatype.  I can’t think of a time where I wouldn’t care about the precision and accuracy of money.

So let’s try converting our salaries to decimal(10,2):

Well, that didn’t work.  Maybe converting to floats will work as a quick fix?

Nope that didn’t work either

(sidenote: I’m OK with that though – I don’t think float should ever be used for storing money.  If you want to see a quick example of why float math is problematic take a look at this (and for more detail read about it here):

Ok so those didn’t work.  What if we try converting to the money datatype – that should work for being able to read these money formats right?

50% correct…!  Our employees in Turkey are being seriously underpaid with conversion though.  I’m kind of glad to not have to rely on this solution though since the money datatype has its own fair share of problems as well.

Time to get Cultured

SQL Server’s TRY_PARSE function might be able to help us, but first we need to create a relationship between each country’s money formatting and it’s culture code:

And finally our SELECT query:

Success!  Our salary values are now perfectly converted into the decimal datatype without the need for any ugly REPLACE(), SUBSTRING(), or other string parsing functions.

While this carefully curated demo correctly converted all of our values, it’s important to always test that the culture value you choose correctly formats your string formatted number.  For example, Wikipedia leads me to believe that the  Danes write their numbers like “6 338,70” SQL Server’s culture definition doesn’t convert this correctly:

In those instances, you may need to substitute another culture code to get the correct conversion to occur.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Extracting JSON Values Longer Than 4000 Characters

Published on: 2018-09-18

A while back I built an automated process that parses JSON strings into a relational format.

Up until recently this process had been working great: my output table had all of the data I was expecting, neatly parsed into the correct rows and columns.

Last week I noticed an error in the output table however.  One row that was supposed to have a nicely parsed JSON value for a particular column had an ugly NULL instead.

Truncated?

First I checked my source JSON string – it had the “FiveThousandAs” property I was looking for:

So the source data was fine.

I checked the table column I was inserting into as well and confirmed it was defined as nvarchar(max), so no problem there.

The last thing I checked was the query I was using:

If I run that on it’s own, I reproduce the NULL I was seeing inserted into my table:

JSON_VALUE is limiting

After a little bit more research, I discovered that the return type for JSON_VALUE is limited to 4000 characters.   Since JSON_VALUE is in lax mode by default, if the output has more than 4000 characters, it fails silently.

To force an error in future code I could use SELECT JSON_VALUE(@json, 'strict $.FiveThousandAs')  so at least I would be notified immediately of an problem with my  query/data (via failure).

Although strict mode will notify me of issues sooner, it still doesn’t help me extract all of the data from my JSON property.

(Side note: I couldn’t define my nvarchar(max) column as NOT NULL because for some rows the value could be NULL, but in the future I might consider adding additional database validation with a check constraint).

OPENJSON

The solution to reading the entire 5000 character value from my JSON property is to use OPENJSON:

My insert query needed to be slightly refactored, but now I’m able to return any length value (as long as it’s under 2gb).

In hindsight, I should have used OPENJSON() from the start: not only is it capable of parsing the full length values from JSON strings, but it performs significantly faster than any of the other SQL Server JSON functions.

As a best practice, I think I’m going to use OPENJSON by default for any JSON queries to avoid problems like this in the future.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Faking Temporal Tables with Triggers

Published on: 2018-09-11

This post is a response to this month’s T-SQL Tuesday #106 prompt by Steve Jones.  T-SQL Tuesday is a way for the SQL Server community to share ideas about different database and professional topics every month.

This month’s topic asks to share our experiences with triggers in SQL Server.


Triggers are something that I rarely use.  I don’t shy away from them because of some horrible experience I’ve had, but rather I rarely have a good need for using them.

The one exception is when I need a poor man’s temporal table.

Temporal Table <3

When temporal tables were added in SQL Server 2016 I was quick to embrace them.

A lot of the data problems I work on benefit from being able to view what data looked like at a certain point back in time, so the easy setup and queriability of temporal tables was something that I immediately loved.

No System Versioning For You

Sometimes I can’t use temporal tables though, like when I’m forced to work on an older version of SQL Server.

Now, this isn’t a huge issue; I can still write queries on those servers to achieve the same result as I would get with temporal tables.

But temporal tables have made me spoiled.  They are easy to use and I like having SQL Server manage my data for me automatically.

Fake Temporal Tables With Triggers

I don’t want to have to manage my own operational versus historical data and write complicated queries for “point-in-time” analysis, so I decided to fake temporal table functionality using triggers.

Creating the base table and history table are pretty similar to that of a temporal table, just without all of the fancy PERIOD and GENERATED ALWAYS syntax:

The single UPDATE,DELETE trigger is really where the magic happens though.  Everytime a row is updated or deleted, the trigger inserts the previous row of data into our history table with correct datetimes:

The important aspect to this trigger is that we always join our dbo.Birds table to our inserted and deleted tables based on the primary key, which is the Id column in this case.

If you try to insert/update/delete data from the dbo.Birds table, the dbo.BirdsHistory table will be updated exactly like a regular temporal table would:

If you run each of those batches one at a time and check both tables, you’ll see how the dbo.BirdsHistory table keeps track of all of our data changes.

Now seeing what our dbo.Birds data looked like at a certain point-in-time isn’t quite as easy as a system versioned table in SQL Server 2016, but it’s not bad:

Real Performance

One reason many people loath triggers is due to their potential for bad performance (particular when many triggers get chained together).

I wanted to see how this trigger solution compares to an actual temporal table.  While searching for good ways to test this difference, I found that Randolph West has done some testing on trigger-based temporal tables.  While our solutions are different, I like their performance testing methodology: view the transaction log records for real temporal tables and compare them to those of the trigger-based temporal tables.

I’ll let you read the details of how to do the comparison test in their blog post but I’ll just summarize the results of my test: the trigger based version is almost the same as a real system versioned temporal table.

Because of how I handle updating the SysStartTime column in my dbo.Birds table, I get one more transaction than a true temporal table:

You could make the trigger solution work identical to the true temporal table (as Randolph does) if you are willing to make application code changes to populate the SysStartTime column on insert into dbo.Birds.

Conclusion

For my purposes, the trigger-based temporal table solution has a happy ending.  It works for the functionality that I need it for and prevents me from having to manage a history table through some other process.

If you decide to use this in your own pre-2016 instances, just be sure to test the functionality you need; while it works great for the purposes that I use temporal tables for, your results may vary if you need additional functionality (preventing truncates on the history table, defining a retention period for the history, etc… are all features not implemented in the examples above).

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!