Shortchanged with International Money in SQL Server

Published on: 2018-09-25

Imagine you have to perform some salary analysis for your employer International Mega Corp.

The data you have to work with looks something like this:

DROP TABLE IF EXISTS ##InternationalMegaCorpSalaries;
GO
CREATE TABLE ##InternationalMegaCorpSalaries
(
	EmployeeId int IDENTITY,
	Salary nvarchar(10),
	Country nvarchar(20),
);
INSERT INTO ##InternationalMegaCorpSalaries VALUES ('1,000.00','United States');
INSERT INTO ##InternationalMegaCorpSalaries VALUES ('8 789,37','Sweden');
INSERT INTO ##InternationalMegaCorpSalaries VALUES ('6.274,26','Turkey');
INSERT INTO ##InternationalMegaCorpSalaries VALUES ('1000.00','United States');

Why are the salaries stored as nvarchar and formatted with commas, spaces, and periods?

Great question!  Someone wanted to make sure these amounts would look good in the UI so storing the formatted values in the database would be the way to go…

Pretty for the UI, not really great for needing to do analysis on.

Thanks to Zanoni Labuschagne, one of the subscribers to my YouTube channel, for recommending this topic!

CONVERT!

I’m a firm believer that money values should always be stored in the decimal datatype.  I can’t think of a time where I wouldn’t care about the precision and accuracy of money.

So let’s try converting our salaries to decimal(10,2):

SELECT 
	EmployeeId,
	TRY_CONVERT(decimal(10,2),Salary) AS Salary,
	Country 
FROM ##InternationalMegaCorpSalaries

Well, that didn’t work.  Maybe converting to floats will work as a quick fix?

SELECT 
	EmployeeId,
	TRY_CONVERT(float,Salary) AS Salary,
	Country 
FROM ##InternationalMegaCorpSalaries

Nope that didn’t work either

(sidenote: I’m OK with that though – I don’t think float should ever be used for storing money.  If you want to see a quick example of why float math is problematic take a look at this (and for more detail read about it here):

DECLARE 
	@Num1 float = .15,
	@Num2 float = .15,
	@Num3 float = .1,
	@Num4 float = .2
-- Not equal
SELECT IIF(@Num1+@Num2 = @Num3+@Num4,1,0)

Ok so those didn’t work.  What if we try converting to the money datatype – that should work for being able to read these money formats right?

SELECT 
	EmployeeId,
	TRY_CONVERT(money,Salary) AS Salary,
	Country 
FROM ##InternationalMegaCorpSalaries

50% correct…!  Our employees in Turkey are being seriously underpaid with conversion though.  I’m kind of glad to not have to rely on this solution though since the money datatype has its own fair share of problems as well.

Time to get Cultured

SQL Server’s TRY_PARSE function might be able to help us, but first we need to create a relationship between each country’s money formatting and it’s culture code:

ALTER TABLE ##InternationalMegaCorpSalaries
ADD CultureCode AS CASE Country 
						WHEN 'United States' THEN 'en-US'
						WHEN 'Sweden' THEN 'sv-SE'
						WHEN 'Turkey' THEN 'Tr-TR' END

And finally our SELECT query:

SELECT 
	EmployeeId,
	TRY_PARSE(Salary AS DECIMAL(10,2) USING CultureCode) AS Salary,
	Country 
FROM ##InternationalMegaCorpSalaries

Success!  Our salary values are now perfectly converted into the decimal datatype without the need for any ugly REPLACE(), SUBSTRING(), or other string parsing functions.

While this carefully curated demo correctly converted all of our values, it’s important to always test that the culture value you choose correctly formats your string formatted number.  For example, Wikipedia leads me to believe that the  Danes write their numbers like “6 338,70” SQL Server’s culture definition doesn’t convert this correctly:

SELECT TRY_PARSE('6 338,70' AS DECIMAL(10,2) USING 'da-dk')

In those instances, you may need to substitute another culture code to get the correct conversion to occur.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Extracting JSON Values Longer Than 4000 Characters

Published on: 2018-09-18

A while back I built an automated process that parses JSON strings into a relational format.

Up until recently this process had been working great: my output table had all of the data I was expecting, neatly parsed into the correct rows and columns.

Last week I noticed an error in the output table however.  One row that was supposed to have a nicely parsed JSON value for a particular column had an ugly NULL instead.

Truncated?

First I checked my source JSON string – it had the “FiveThousandAs” property I was looking for:

DECLARE @json nvarchar(max) = N'{
    "Id" : 1,
    "FiveThousandAs" : "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
    "FourAs" : "aaaa"
}';

So the source data was fine.

I checked the table column I was inserting into as well and confirmed it was defined as nvarchar(max), so no problem there.

The last thing I checked was the query I was using:

SELECT JSON_VALUE(@json, '$.FiveThousandAs')

If I run that on it’s own, I reproduce the NULL I was seeing inserted into my table:

JSON_VALUE is limiting

After a little bit more research, I discovered that the return type for JSON_VALUE is limited to 4000 characters.   Since JSON_VALUE is in lax mode by default, if the output has more than 4000 characters, it fails silently.

To force an error in future code I could use SELECT JSON_VALUE(@json, ‘strict $.FiveThousandAs’)  so at least I would be notified immediately of an problem with my  query/data (via failure).

Although strict mode will notify me of issues sooner, it still doesn’t help me extract all of the data from my JSON property.

(Side note: I couldn’t define my nvarchar(max) column as NOT NULL because for some rows the value could be NULL, but in the future I might consider adding additional database validation with a check constraint).

OPENJSON

The solution to reading the entire 5000 character value from my JSON property is to use OPENJSON:

SELECT * 
FROM OPENJSON(@json) 
WITH (FiveThousandAs nvarchar(max) '$.FiveThousandAs')

My insert query needed to be slightly refactored, but now I’m able to return any length value (as long as it’s under 2gb).

In hindsight, I should have used OPENJSON() from the start: not only is it capable of parsing the full length values from JSON strings, but it performs significantly faster than any of the other SQL Server JSON functions.

As a best practice, I think I’m going to use OPENJSON by default for any JSON queries to avoid problems like this in the future.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!


Ignoring NULLs with FIRST_VALUE

Published on: 2018-08-28

Watch this week’s episode on YouTube.

The SQL Server FIRST_VALUE function makes it easy to return the “first value in an ordered set of values.”

The problem is that if that first value happens to be a NULL, there is no easy, built-in way to skip it.

While a UserVoice item exists to add the ability to ignore nulls (go vote!), today, we’re going accomplish that end result with some alternative queries.

The Setup

Here’s the example data we’ll be skipping nulls on:

CREATE TABLE ##Data
(
       Id int IDENTITY(0,1),
       GroupId int,
       Value1 int
);
GO
INSERT INTO ##Data VALUES (1,1)
INSERT INTO ##Data VALUES (1,1)
INSERT INTO ##Data VALUES (1,3)
INSERT INTO ##Data VALUES (2,NULL)
INSERT INTO ##Data VALUES (2,NULL)
INSERT INTO ##Data VALUES (2,6)
INSERT INTO ##Data VALUES (2,4)
INSERT INTO ##Data VALUES (2,5);
GO

We’ve got a an integer identity column, two groups of rows, and NULLs that are sprinkled into otherwise unsuspecting integer values.

If we write a query that uses the FIRST_VALUE function, you’ll notice that our NULL gets chosen in group two – not quite what we want:

SELECT
       Id,
       GroupId,
       Value1,
       FIRST_VALUE(Value1) OVER (PARTITION BY GroupId ORDER BY Id) AS FirstValue1
FROM
       ##Data

Let’s look at two queries that will help us get the number 6 into that FirstValue1 column for the second group.

The Contenders

“The Derived FIRST_VALUE”

First up is still the FIRST_VALUE function, but inside of a derived table:

SELECT
    d.Id,
    d.GroupId,
    d.Value1,
    d2.FirstNotNullValue1
FROM
    ##Data d
    INNER JOIN
    (
    SELECT DISTINCT
        GroupId,
        FIRST_VALUE(Value1) OVER (PARTITION BY GroupId ORDER BY Id) as FirstNotNullValue1
    FROM ##Data
    WHERE Value1 IS NOT NULL
    ) d2
        ON d.GroupId = d2.GroupId

By filtering out NULLs in our derived table query, FIRST_VALUE returns the first non-null value like we want.  We then join that back to the original data and all is right again.

“The Triple Join”

Our second attempt at this query sends us back to the dark ages of SQL Server 2008 before the FIRST_VALUE function existed:

SELECT
    d.Id,
    d.GroupId,
    d.Value1,
    d2.Value1 AS FirstNotNullValue1
FROM
    ##Data d
    LEFT JOIN
    (
    SELECT
        GroupId,
        MIN(Id) AS FirstNotNullIdValue1
    FROM
        ##Data
    WHERE
        Value1 IS NOT NULL
    GROUP BY
        GroupId
    ) m
        ON d.GroupId = m.GroupId
    INNER JOIN ##Data d2
        ON m.FirstNotNullIdValue1 = d2.Id;

We perform a triple join, with the critical element being our derived table which gets the MIN Id for each group of rows where Value1 IS NOT NULL.  Once we have the minimum Id for each group, we join back in the original data and produce the same final result:

The Performance

Both of the above queries produce the same output – which one should you use in your production code?

Well, the “Derived FIRST_VALUE” query has a lower relative cost than the “Triple Join” query, maybe it’s better?

This isn’t a real-world execution plan though – surely we never scan heaps our production environments.

Let’s add a quick clustered index and see if that changes anything:

CREATE CLUSTERED INDEX CL_Id ON ##Data (GroupId,Id,Value1)

Okay, a closer match up but the “Derived FIRST_VALUE” query still appears to have a slight edge.

If we SET STATISTICS IO ON though we start to see a different story:

With only 8 rows of data, our “Derived FIRST_VALUE” query sure is performing a lot of reads.

What if we increase the size of our sample dataset?

SET STATISTICS IO, TIME OFF;
SET NOCOUNT ON;
GO
INSERT INTO ##Data (GroupId, Value1)  
SELECT GroupId, Value1 FROM ##Data
GO 10

And now check our plans and stats IO:

WOW that’s a lot of reads in the “Derived FIRST_VALUE” query.

Conclusion

Besides sharing some solutions, the point I tried to make above is that DON’T TRUST CODE YOU FIND ON THE INTERNET (or in books, or copied from colleagues, etc…)

Both of the above queries will return the first value without NULLs.  But they probably won’t perform exactly the same as they did on my examples above.

Copy the above code for sure – but test it out. See what works better on your specific server configuration, data size, and indexes.  Maybe both queries are terrible and you need a third, better way of doing it (if you write one, let me know!) – but please, please, please, always test your code.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!