Data with Bert logo

CHOOSE() in SQL Server

Watch this week's video on YouTube

While I know I don't utilize most of the features available in SQL Server, I like to think I'm at least aware that those features exist.

This week I found a blind-spot in my assumption however. Even though it shipped in SQL Server 2012, the SQL Server CHOOSE function is a feature that I think I'm seeing for the first time this past week.

CHOOSE is CASE

CHOOSE returns the n-th item from a comma-delimited list.

Whenever learning a new feature in SQL Server I try to think of a good demo I could build to test out the functionality. In this case the immediate example that came to mind was building something that would provide a lookup of values:

SELECT 
    [key],
    [value],
    [type],
    CHOOSE(type+1,'null','string','int','boolean','array','object') AS JsonType
FROM
    OPENJSON(N'{
        "Property1":null,
        "Property2":"a",
        "Property3":3,
        "Property4":false,
        "Property5":[1,2,"3"],
        "Property6":{
            "SubProperty1":"a"
        }
    }');

In this case, the OPENJSON function returns a "type" field that indicates the datatype of that particular JSON property's value. The issue is that the "type" column is numeric and I can never remember what type of data each number represents.

The above query solves this by using CHOOSE to create a lookup of values. Since OPENJSON returns results starting with 0, we need to use type+1 in order to get the 1-based CHOOSE function to work correctly:

2019-06-03-21-08-58

If we look at the CHOOSE function's scalar operator properties in the execution plan, we'll see that this function is just a fancy alias for a more verbose CASE statement:

[Expr1000] = Scalar Operator(
    CASE WHEN (CONVERT_IMPLICIT(int,OPENJSON_DEFAULT.[type],0)+(1))=(1) 
    THEN 'null' 
    ELSE 
        CASE WHEN (CONVERT_IMPLICIT(int,OPENJSON_DEFAULT.[type],0)+(1))=(2) 
        THEN 'string' 
        ELSE 
            CASE WHEN (CONVERT_IMPLICIT(int,OPENJSON_DEFAULT.[type],0)+(1))=(3) 
            THEN 'int' 
            ELSE 
                CASE WHEN (CONVERT_IMPLICIT(int,OPENJSON_DEFAULT.[type],0)+(1))=(4) 
                THEN 'boolean' 
                ELSE 
                    CASE WHEN (CONVERT_IMPLICIT(int,OPENJSON_DEFAULT.[type],0)+(1))=(5) 
                    THEN 'array' 
                    ELSE 
                        CASE WHEN (CONVERT_IMPLICIT(int,OPENJSON_DEFAULT.[type],0)+(1))=(6) 
                        THEN 'object' 
                        ELSE NULL END 
                    END 
                END 
            END 
        END 
    END
)

The Set-Based Way

I think one of the reasons I've never used CHOOSE is because I would hate typing up all of those lookup values and trapping them in a SELECT statement, never to be used again.

Previously, I would have stored the lookup values in table and joined them with the OPENJSON results to accomplish the same end result:

DROP TABLE IF EXISTS #JsonType;
CREATE TABLE #JsonType
(
    Id tinyint,
    JsonType varchar(20),
    CONSTRAINT PK_JsonTypeId PRIMARY KEY CLUSTERED (Id)
);

INSERT INTO #JsonType VALUES (0,'null');
INSERT INTO #JsonType VALUES (1,'string');
INSERT INTO #JsonType VALUES (2,'int');
INSERT INTO #JsonType VALUES (3,'boolean');
INSERT INTO #JsonType VALUES (4,'array');
INSERT INTO #JsonType VALUES (5,'object');

SELECT 
    j.[key],
    j.[value],
    j.[type],
    t.JsonType
FROM
    OPENJSON(N'{
        "Property1":null,
        "Property2":"a",
        "Property3":3,
        "Property4":false,
        "Property5":[1,2,"3"],
        "Property6":{
                        "SubProperty1":"a"
                    }
    }') j
    INNER JOIN #JsonType t
        ON j.[type] = t.Id

While more initial setup is involved with this solution, it's more flexible long-term. With a centralized set of values, there's no need to update the CHOOSE function in all of your queries when you can update the values in a single lookup table.

And while I didn't bother performance testing it, by virtue of being a scalar function, CHOOSE will probably perform worse in many real-world scenarios when compared to the table-based lookup approach (eg. large datasets, parallel plans, etc...).

CHOOSE What Works For You

I'm not surprised that it took me this long to learn about the CHOOSE function: while a simplified way to write certain CASE statements, I can't think of many (any?) scenarios where I would prefer to use it over a CASE or a lookup-table solution.

Predicate Execution Order on Mixed Data Type Columns

Watch this week's video on YouTube

SQL Server's cost-based query optimizer does a pretty good job of figuring out what order to filter your data to get fast query executions. It considers things like index coverage, data distribution, and much more to decide how to retrieve your query's data.

However, these good intentions can become problematic in certain situations where you know more about your data than SQL Server does. When this happens, the order SQL Server chooses to execute predicates is important not just for performance of your query but for the business logic as well.

A Column With Mixed Data Types

Let's look at the following example table and data:

USE master;
DROP DATABASE IF EXISTS MixedDataTypes;
CREATE DATABASE MixedDatatypes;
USE MixedDatatypes;
GO

CREATE TABLE dbo.Pages
(
    Id int identity,
    PageName varchar(20),
    DataValue varchar(100),
    DataType varchar(20),
    CONSTRAINT PK_Id PRIMARY KEY (Id)
);
GO
INSERT INTO dbo.Pages VALUES ('StringsOnlyPage 1','abc','string')
GO 2000
INSERT INTO dbo.Pages VALUES ('NumbersOnlyPage 1','1.20','decimal')
GO 2000
INSERT INTO dbo.Pages VALUES ('NumbersOnlyPage 2','1.20','decimal')
GO 2000
INSERT INTO dbo.Pages VALUES ('MixedDataTypesPage 1','abc','string')
GO 1000
INSERT INTO dbo.Pages VALUES ('MixedDataTypesPage 1','1.20','decimal')
GO 1000

This table stores data for an application that has many different types of Pages. Each Page stores different types of data, but instead of creating a separate table for each type, we store all the different data in the varchar DataValue column and maintain the original data type in the DataType column.

This structure reduces the complexity required for maintaining our database (compared to creating possibly hundreds of tables, one for each PageName) and makes querying easier (only need to query one table). However, this design could also lead to some unexpected query results.

Filtering Mixed Data Values

Let's say we want to retrieve all data from one table with where the value is 1.2:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName = 'NumbersOnlyPage 1' AND DataValue = '1.2'

This query runs fine. The problem is since our original data type was a decimal with a value of 1.20, this string-based comparison doesn't work. What we really want to have happen is a numeric comparison in our predicate:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName = 'NumbersOnlyPage 1' AND DataValue = 1.2

2019-05-20-20-34-01

While the implicit conversion occurring on the table's DataValue column is not ideal, if the number of rows it needs to convert is small it's not so bad (plus, this isn't the point of today's post, so try and look past it for a few more moments).

Here comes the fun: what if we want to check all our Pages that contain numeric data for values of 1.2? We could write this query in a couple of different ways:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND DataValue = 1.2
--or
SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName in ('NumbersOnlyPage 1','NumbersOnlyPage 2') AND DataValue = 1.2

For both queries, we receive the error "Error converting data type varchar to numeric".

2019-05-20-20-34-40

Why? In this case SQL Server decides to do the implicit conversions on the DataValue column first before filtering on our PageName columns.

Up until this last query, SQL Server was deciding that it would be more efficient to filter the rows down to the specific Page first and then do the implicit conversions on the DataValue column. However, now that we are selecting more than one table, SQL Server says determines it has to scan everything anyway, it might as well do all of the implicit conversions first and filter on table names later.

The problem of course is that all our DataValue values are not numeric. In this case the order of the predicates does matter, not for performance but to be able to correctly execute the business logic that we defined as part of our query.

Not Good Solutions

One way we can fix this is to tempt SQL Server to filter on PageName first by adding an index:

CREATE NONCLUSTERED INDEX IX_PageName ON dbo.Pages (PageName) INCLUDE (DataValue);

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND DataValue = 1.2

This works great. SQL Server decides that since this index covers all the fields in our query, and because the index key is PageName, it will filter the rows on PageName first and perform the implicit conversions on the remaining rows.

The problem with this is that it's not guaranteed. Something may happen that will cause SQL Server not use this index in the future: our index doesn't cover our query anymore, we add some additional filtering, the index is removed so it can be replaced by a different index that will no longer be selected for this particular query, etc...

It just isn't a reliable option.

Plus it doesn't work in all scenarios. Let's say we parameterize the PageName and use the STRING_SPLIT() function to filter our Pages to only those passed in:

DECLARE @PageNames varchar(100) = 'NumbersOnlyPage 1,NumbersOnlyPage 2';
SELECT *
FROM dbo.Pages
WHERE PageName in (SELECT value FROM string_split(@PageNames,',')) AND DataValue = 1.2

We are back to square one since in this case STRING_SPLIT() needs to parse the PageName data first and then join it in with the rest of the data, causing our original failure scenario (this is the estimated execution plan):

2019-05-20-20-36-08

Other Options

So while indexing seems to fix the solution, it's not guaranteed to work 100% of the time.

Obviously we could not store data in this format, but that would add complexity to the database and app.

We could try to add the PageName filter into a derived table and force the join order, but that's ugly and will force us to read the table multiple times.

Since we also have data type information available for each row, we might consider utilizing that information:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND DataValue = 1.2 AND DataType = 'decimal'

But once again if this works it's through sheer luck.

TRY_CONVERT() is another option. This function returns nulls if it can't convert to a decimal:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND TRY_CONVERT(decimal(2,1),DataValue) = 1.2 

This is actually a pretty good option since it's guaranteed to work regardless of which column SQL Server filters on first. If the number of DataValues you have to TRY and CONVERT is relatively small though, this may be your best choice.

For better performance, you can create a second column that contains data in decimal (or any other type) format:

ALTER TABLE dbo.Pages
ADD DataValueDecimal AS TRY_CONVERT(decimal(2,1),DataValue) PERSISTED

You could index both DataValue* columns and your performance would be pretty good. The downside here of course is that your app queries will have to change to match the new table structure:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND CASE WHEN DataType = 'decimal' THEN DataValueDecimal ELSE DataValue END = 1.2 

In conclusion, it's tough to say what the best option is for this type of scenario. However, it's important to keep in mind that if you decide to structure and write your queries in this format, you need to plan for order of operation issues and handle errors gracefully.

Animating a Magic 8 Ball in SQL Server Management Studio

MJ-t-sql-Tuesday

This post is a response to this month's T-SQL Tuesday #114 prompt by Matthew McGiffen.  T-SQL Tuesday is a way for the SQL Server community to share ideas about different database and professional topics every month. This month Matthew asks us to write about puzzles, so I decided to recreate a childhood favorite in SQL Server.


Watch this week's video on YouTube

As a kid, I found Magic 8 Balls alluring. There is something appealing about a who-knows-how-many-sides die emerging from the depths of a mysterious inky blue fluid to help answers life's most difficult questions.

I never ended up buying a magic eight ball of my own though, so today I'm going to build and animate one in SQL Server Management Studio.

Fun and Valuable? Signs point to yes.

While building a magic eight ball in SQL Server is not the most useful project in the world it is:

  1. Fun
  2. A great way to learn lots of cool SSMS and SQL tips and tricks to use in more useful situations.

Here's an example of the finished project followed by all of the components that make this project work. The full code for this solution can be found at the bottom of this post.

SSMSMagic8Ball

VALUES()

I needed a way to store all of the Magic 8 Ball messages. Some days I like UNIONing together a bunch of SELECT statements, but for these "larger" static datasets I like the syntax of VALUES().

SELECT * FROM 
(VALUES  
    ('It is certain.'), 
    ('It is decidedly so.'), 
    ('Without a doubt.'), 
    ('Yes - definitely.'), 
    ('You may rely on it.'), 
    ('As I see it, yes.'), 
    ('Most likely.'), 
    ('Outlook good.'), 
    ('Yes.'), 
    ('Signs point to yes.'), 

    ('Reply hazy, try again.'), 
    ('Ask again later.'), 
    ('Better not tell you now.'), 
    ('Cannot predict now.'), 
    ('Concentrate and ask again.'), 

    ('Don''t count on it.'), 
    ('My reply is no.'), 
    ('My sources say no.'), 
    ('Outlook not so good.'), 
    ('Very doubtful.') 
) T(Response) 

ORDER BY NEWID()

After we create our data set of static messages, we need to randomly return 1 message for every shake of the eight ball. My favorite way to return one random record is to order the data by NEWID() (creating a random order for values) and then using TOP 1 to return only the first random record:

DECLARE @Message varchar(100) = '';

WITH MagicResponses AS ( 
    ...<VALUES() query from above>...
)

SELECT TOP 1 @Message = Response FROM MagicResponses ORDER BY NEWID();

Table Driven Animation

While I never have used this technique for animating an image before, I have used a control table to drive what data should get processed in an ETL.

In today's case, instead of saving the values of what data was last manipulated in an SSIS package, I'll be storing what each action each frame of animation should display, as well as how much delay to put in between each frame:

CREATE TABLE dbo.AnimationControl
(
    Id int IDENTITY PRIMARY KEY,
    ActionToTake varchar(20),
    DelayToTake varchar(20),
    ActionTakenDate datetime2

);

INSERT INTO dbo.AnimationControl (ActionToTake,DelayToTake) VALUES ('ShakeLeft','00:00:00.100');
INSERT INTO dbo.AnimationControl (ActionToTake,DelayToTake) VALUES ('ShakeRight','00:00:00.100');
INSERT INTO dbo.AnimationControl (ActionToTake,DelayToTake) VALUES ('ShakeLeft','00:00:00.100');
INSERT INTO dbo.AnimationControl (ActionToTake,DelayToTake) VALUES ('ShakeRight','00:00:00.100');
INSERT INTO dbo.AnimationControl (ActionToTake,DelayToTake) VALUES ('ShakeLeft','00:00:00.100');
INSERT INTO dbo.AnimationControl (ActionToTake,DelayToTake) VALUES ('ShakeRight','00:00:00.100');
INSERT INTO dbo.AnimationControl (ActionToTake,DelayToTake) VALUES ('Reveal','00:00:00.500');

WAITFOR DELAY

I wanted there to be a different delay between certain animation frames (I believe the final message reveal deserves a slightly more dramatic pause), so I'm using WAITFOR DELAY to achieve that.

WAITFOR DELAY @DelayToTake;

PRINT

The goal here is to print this ascii 8 ball shaking left and right before displaying the message. We do this using good old fashioned PRINT(). After printing a particular frame we update our control table to indicate that particular frame has been drawn.

IF @CurrentActionType = 'ShakeLeft'
BEGIN 
    PRINT(' 
                _.a$$$$$a._ 
              ,$$$$$$$$$$$$$. 
            ,$$$$$$$$$$$$$$$$$. 
           d$$$$$$$$$$$$$$$$$$$b 
          d$$$$$$$$~`"`~$$$$$$$$b 
         ($$$$$$$p   _   q$$$$$$$) 
         $$$$$$$$   (_)   $$$$$$$$ 
         $$$$$$$$   (_)   $$$$$$$$ 
         ($$$$$$$b       d$$$$$$$) 
          q$$$$$$$$a._.a$$$$$$$$p 
           q$$$$$$$$$$$$$$$$$$$p 
            `$$$$$$$$$$$$$$$$$` 
              `$$$$$$$$$$$$$` 
                `~$$$$$$$~` 
    ') 
END 

GO

You might be wondering why I decided to use a control table to dictate what images to animate. The trouble was that in order to get the PRINT to actually display our ascii images on screen in SSMS, the batch needed to finish submitting. So each frame we print needs to be part of its own batch.

Since we have 7 frames in our animation, we need to execute our procedure 7 times.

Alternatively we can use GO 7, but then we get that ugly batch execution completed message which I don't think there is anyway to hide:

EXEC dbo.USP_ShakeThe8Ball;
GO 7

-- OR
EXEC dbo.USP_ShakeThe8Ball;
GO
EXEC dbo.USP_ShakeThe8Ball;
GO
EXEC dbo.USP_ShakeThe8Ball;
GO
EXEC dbo.USP_ShakeThe8Ball;
GO
EXEC dbo.USP_ShakeThe8Ball;
GO
EXEC dbo.USP_ShakeThe8Ball;
GO
EXEC dbo.USP_ShakeThe8Ball;
GO

Completely Useless? I guess not

I've always been a fan of occasionally taking a break to build things for pure fun. It's a good way to apply lesser known features to your code, stretch your creativity for solving problems, and of course push software functionality to their limits through feature abuse.

Here is the full set of code if you want to run it for yourself (note, this works on a 1920x1080 resolution monitor with SSMS at full screen...your results may vary):

CREATE OR ALTER PROCEDURE dbo.USP_ShakeThe8Ball
AS
BEGIN
    /* Hide extra output to the messages window that will ruin our animation */
    SET NOCOUNT ON;
    SET ANSI_WARNINGS OFF;

    /* Set up a table to keep track of our animation frames and insert into it */
    IF OBJECT_ID('dbo.AnimationControl') IS NULL
    BEGIN
        CREATE TABLE dbo.AnimationControl
        (
            Id int IDENTITY PRIMARY KEY,
            ActionToTake varchar(20),
            DelayToTake varchar(20),
            ActionTakenDate datetime2

        );
    END;

    INSERT INTO dbo.AnimationControl (ActionToTake,DelayToTake) VALUES ('ShakeLeft','00:00:00.100');
    INSERT INTO dbo.AnimationControl (ActionToTake,DelayToTake) VALUES ('ShakeRight','00:00:00.100');
    INSERT INTO dbo.AnimationControl (ActionToTake,DelayToTake) VALUES ('ShakeLeft','00:00:00.100');
    INSERT INTO dbo.AnimationControl (ActionToTake,DelayToTake) VALUES ('ShakeRight','00:00:00.100');
    INSERT INTO dbo.AnimationControl (ActionToTake,DelayToTake) VALUES ('ShakeLeft','00:00:00.100');
    INSERT INTO dbo.AnimationControl (ActionToTake,DelayToTake) VALUES ('ShakeRight','00:00:00.100');
    INSERT INTO dbo.AnimationControl (ActionToTake,DelayToTake) VALUES ('Reveal','00:00:00.500');


    DECLARE @CurrentActionId int = 1;
    DECLARE @CurrentActionType varchar(20) = 'ShakeLeft';
    DECLARE @DelayToTake varchar(20) = '00:00:00.100';

    /* If more than 1 second elapsed, clear the control table */
    DECLARE @LastRunDate datetime2;
    SELECT @LastRunDate = MAX(ActionTakenDate) FROM dbo.AnimationControl;

    IF DATEDIFF(millisecond,@LastRunDate,GETDATE()) > 1000
    BEGIN
        UPDATE dbo.AnimationControl SET ActionTakenDate = NULL;
    END

    /* Which action/frame are we currently on? */

    SELECT @CurrentActionId = MIN(Id) FROM dbo.AnimationControl WHERE ActionTakenDate IS NULL;
    SELECT @CurrentActionType = ActionToTake,
            @DelayToTake = DelayToTake
        FROM dbo.AnimationControl WHERE Id = @CurrentActionId


    WAITFOR DELAY @DelayToTake;
    /* Since we can't clear the Messages window, we need to fill it with
    blank space between animation frames to achieve the desired effect */


    PRINT(' 




        '); 


    IF @CurrentActionType = 'ShakeLeft'
    BEGIN 

        PRINT(' 
                  _.a$$$$$a._ 
                 ,$$$$$$$$$$$$$. 
               ,$$$$$$$$$$$$$$$$$. 
              d$$$$$$$$$$$$$$$$$$$b 
             d$$$$$$$$~`"`~$$$$$$$$b 
            ($$$$$$$p   _   q$$$$$$$) 
            $$$$$$$$   (_)   $$$$$$$$ 
            $$$$$$$$   (_)   $$$$$$$$ 
            ($$$$$$$b       d$$$$$$$) 
             q$$$$$$$$a._.a$$$$$$$$p 
              q$$$$$$$$$$$$$$$$$$$p 
               `$$$$$$$$$$$$$$$$$` 
                 `$$$$$$$$$$$$$` 
                   `~$$$$$$$~` 
        ') 

    END 

    If @CurrentActionType = 'ShakeRight' 

    BEGIN 

        PRINT(' 
                      _.a$$$$$a._ 
                     ,$$$$$$$$$$$$$. 
                   ,$$$$$$$$$$$$$$$$$. 
                  d$$$$$$$$$$$$$$$$$$$b 
                 d$$$$$$$$~`"`~$$$$$$$$b 
                ($$$$$$$p   _   q$$$$$$$) 
                $$$$$$$$   (_)   $$$$$$$$ 
                $$$$$$$$   (_)   $$$$$$$$ 
                ($$$$$$$b       d$$$$$$$) 
                 q$$$$$$$$a._.a$$$$$$$$p 
                  q$$$$$$$$$$$$$$$$$$$p 
                   `$$$$$$$$$$$$$$$$$` 
                     `$$$$$$$$$$$$$` 
                       `~$$$$$$$~` ') 

    END 

    IF @CurrentActionType = 'Reveal'
    BEGIN

        DECLARE @Message varchar(100) = '';

        WITH MagicResponses AS ( 
        SELECT * FROM 
        (VALUES  
            ('It is certain.'), 
            ('It is decidedly so.'), 
            ('Without a doubt.'), 
            ('Yes - definitely.'), 
            ('You may rely on it.'), 
            ('As I see it, yes.'), 
            ('Most likely.'), 
            ('Outlook good.'), 
            ('Yes.'), 
            ('Signs point to yes.'), 

            ('Reply hazy, try again.'), 
            ('Ask again later.'), 
            ('Better not tell you now.'), 
            ('Cannot predict now.'), 
            ('Concentrate and ask again.'), 

            ('Don''t count on it.'), 
            ('My reply is no.'), 
            ('My sources say no.'), 
            ('Outlook not so good.'), 
            ('Very doubtful.') 
        ) T(Response) 
        )

        SELECT TOP 1 @Message = Response FROM MagicResponses ORDER BY NEWID();

        BEGIN 

        PRINT(' 
                      _.a$$$$$a._ 
                     ,$$$$$$$$$$$$$. 
                   ,$$$$$$$$$$$$$$$$$. 
                  d$$$$$$$$$$$$$$$$$$$b 
                 d$$$$$$$$~`"`~$$$$$$$$b 
                ($$$$$$$p   _   q$$$$$$$) 
                $$$$$$$$   (_)   $$$$$$$$           ' + @Message + '
                $$$$$$$$   (_)   $$$$$$$$ 
                ($$$$$$$b       d$$$$$$$) 
                 q$$$$$$$$a._.a$$$$$$$$p 
                  q$$$$$$$$$$$$$$$$$$$p 
                   `$$$$$$$$$$$$$$$$$` 
                     `$$$$$$$$$$$$$` 
                       `~$$$$$$$~` ') 

        END 
    END

        PRINT(' 




        '); 

    UPDATE dbo.AnimationControl SET ActionTakenDate = GETDATE() WHERE Id = @CurrentActionId;

END;
GO



/*
CTRL+T first to show Results as Text

Then highlight and execute the following:

EXEC dbo.USP_ShakeThe8Ball;
GO 7

-- OR
EXEC dbo.USP_ShakeThe8Ball;
GO
EXEC dbo.USP_ShakeThe8Ball;
GO
EXEC dbo.USP_ShakeThe8Ball;
GO
EXEC dbo.USP_ShakeThe8Ball;
GO
EXEC dbo.USP_ShakeThe8Ball;
GO
EXEC dbo.USP_ShakeThe8Ball;
GO
EXEC dbo.USP_ShakeThe8Ball;
GO

*/

Data Type Precedence and Implicit Conversions

Watch this week's video on YouTube

SQL Server needs to make sure data types match when performing operations that involve multiple pieces of data.

When the data types do not match, SQL Server has to implicitly convert the data before performing any operations.

While most of the time these implicit conversions go unnoticed, they are important to understand since they can lead to unexpected results.

When 4.4/.44 Doesn't Equal 10

Let's start with this example:

SELECT 4.4/CAST(.44 AS VARCHAR(5))

Ignoring for a moment that our denominator is of type VARCHAR, if we do some quick mental math or use a calculator, we can see that the answer should be 10:

Simple-division

However, if we look at the result SQL Server returns, it's strangely 11:

2019-04-08-16-24-28

To understand why this happens, we need to understand SQL Server's data type precedence logic.

Data Type Precedence

If we start with a simpler version of this example, we'll see SQL Server does in fact know how to perform math and return an answer of 10:

SELECT 4.4/.44

2019-04-08-16-27-16

We can use the SQL_VARIANT_PROPERTY() function to see what data types SQL Server is assuming we are using in our calculation:

SELECT 
    SQL_VARIANT_PROPERTY(4.4,'BaseType'),
    SQL_VARIANT_PROPERTY(.44,'BaseType'),
    SQL_VARIANT_PROPERTY(4.4/.44,'BaseType')

2019-04-08-16-27-58

In this case, since the data types of both components are numeric, SQL Server doesn't have to break a sweat making any drastic conversions to give us our expected answer. If we instead give it something a little more challenging:

SELECT 4.4/CAST(.44 AS FLOAT)

2019-04-08-16-28-41

Here we still return the result of 10, however SQL Server had to do a little more work. We know from the previous example that 4.4 on its own is of type numeric, and in this example we are explicitly casting .44 to a float. SQL Server isn't able to perform the division operation directly on two different data types, so it refers to its data type precedence table to decide which value to convert to a matching datatype.

In the linked table above, the float data type appears higher on the list than numeric (synonym: decimal) data type. This causes SQL Server to convert our numeric 4.4 to a float before dividing.

While SQL Server is doing extra work behind the scenes that we didn't explicitly request, we can't be too angry with it since it still is giving us the "correct" answer.

"Incorrect" Conversions

Let's look at something a little more dastardly:

SELECT CAST(4.4 AS NUMERIC)/CAST(.44 AS FLOAT)

You might think this should also return 10 based on the previous example, but in fact it returns 9.090909:

2019-04-08-16-30-02

While we are still witnessing implicit conversion here (the numeric gets converted to a float in order to allow SQL Server to perform the division), we are also experiencing a case of default data type precision and scale. If we use the SQL_VARIANT_PROPERTY() function again to not only reveal base type but also precision and scale, we'll notice that when we let SQL Server "guess" the scale, it correctly chooses 1 decimal place, while when we use the default scale associated with numeric we get 0:

SELECT 
    SQL_VARIANT_PROPERTY(4.4,'BaseType'),
    SQL_VARIANT_PROPERTY(4.4,'Precision'),
    SQL_VARIANT_PROPERTY(4.4,'Scale') 

SELECT 
    SQL_VARIANT_PROPERTY(CAST(4.4 AS NUMERIC),'BaseType'),
    SQL_VARIANT_PROPERTY(CAST(4.4 AS NUMERIC),'Precision'),
    SQL_VARIANT_PROPERTY(CAST(4.4 AS NUMERIC),'Scale')

2019-04-08-16-32-55

What this really means is that the decimal portion of 4.4 is getting chopped off, leaving us with an equation of 4 / .44 = 11.

Putting Everything Together

So back to our original example that returns 11:

SELECT 4.4/CAST(.44 AS VARCHAR(5))

What exactly is going on here? Well for starters, data type precedence is forcing SQL Server to convert the VARCHAR .44 to a numeric. But a numeric with what precision and scale?

SELECT 
    SQL_VARIANT_PROPERTY(4.4,'BaseType'),
    SQL_VARIANT_PROPERTY(4.4,'Precision'),
    SQL_VARIANT_PROPERTY(4.4,'Scale') 

2019-04-08-16-33-43

Since SQL Server determines that our numerator is NUMERIC(2,1), it converts the denominator to a NUMERIC(2,1) as well. This means instead of dividing by .44, we end up dividing by .4 which results in 11:

-- An explicit version of the same calculation
SELECT CAST(4.4 AS NUMERIC(2,1))/CAST(.44 AS NUMERIC(2,1))

2019-04-08-16-34-30

SQL Server Isn't Wrong

While it's easy to blame SQL Server for not knowing how to perform simple mathematical operations, it's us the developers who are to blame. SQL Server is a piece of software following rules for how to handle unclear situations that we input into it.

Having SQL Server throw an error instead of assuming data types and implicitly converting data on our behalf would make things less ambiguous, but it would also make simple operations a lot more tedious (looking at you SSIS).

In reality, the best solution is to accurately define data types from initial creation and to always be explicit with data types when the value of your computations matters.

Testing with Temporary Stored Procedures

Watch this week's video on YouTube

A while back I learned that it's possible to create temporary stored procedures in SQL Server.

I never put that knowledge into practice however because I struggled to think of a good use case for when a temporary stored procedure would be preferable to a permanent stored procedure.

Not long ago I encountered a scenario where using a temporary stored procedure was the perfect solution to my problem.

Building New Tables

Recently, I had to build a new version of a legacy table because the legacy table's upstream data source was going to be retired.

The new table would contain all the same data as the legacy table, but populated from the new data source. Additionally, the new table would also include additional rows and columns. After building the final table, the plan was to create a view to replace the functionality of the legacy table.

I had to spend quite a bit of time writing a fairly elaborate query to make the data for the new data source match what was appearing in the legacy table (remember my gaps and islands post from a few weeks back? Imagine that on steroids).

Before building out the new ETL and tables though, I wanted to be able to test that the new query was producing the correct results. This was challenging because some of the data sources were on other servers. I was in a catch 22: couldn't test the data because I didn't build the ETL yet, but I didn't want to build the ETL until I tested the results.

Less Than Ideal Options

To make matters worse, I was only able to test my query using data in the production environment.

I thought of putting my new query into a stored procedure to make the logic easier to test. But with the production elevate process being length and restricted, I couldn't easily put my parameterized query into a permanent stored procedure to test my query with.

At this point I had a few options:

  • Don't test anything and just build out the tables and ETL in production. Cross my fingers and hope it all works, fixing any issues after the fact.
  • Create a permanent stored procedure with the query and elevate it to production. Hope that I don't have to make changes and go through the slow elevate process again.
  • Run the query over and over again with different parameters.

I didn't like the first two options because of the amount of time I would lose trying to elevate new tables or procedures into production.

The third option wasn't ideal either because while it would allow me to iterate quickly, documenting all of my tests would involve a massive file that would not be easy to navigate or change.

What I needed was a way to run a query through many different parameters in a concise manner without making any permanent production changes.

Temporary Stored Procedures for Regression Testing

Since I have access to create temporary stored procedures in production, I was able to create a temporary procedure containing my complex query:

CREATE PROCEDURE #ComplexBusinessLogic
  @parm1 int
AS
BEGIN
  /* This isn't the actual query.  The real one was ugly and hundreds of lines long.*/
  SELECT CASE @parm1
    WHEN 1 THEN 'A'
    WHEN 2 THEN 'B'
    WHEN 3 THEN 'C'
    END AS ResultValue
END;

As I mention in the comment of the stored procedure, this query was huge. If I wanted to execute it multiple times with different parameter values to test with, I'd have a gigantic file that would be difficult to navigate and easy to make errors in.

Instead, the temporary stored procedure made it easy to document my tests and execute the query as many times as needed:

/* Test for when parameter is 1 */
EXEC #ComplexBusinessLogic @parm1 = 1;

/* Test for scenario 2 */
EXEC #ComplexBusinessLogic @parm1 = 2;

/* The rarely occuring but very important test scenario 3 */
EXEC #ComplexBusinessLogic @parm1 = 3

I was able then to clearly define all of my tests and run them against the production data without creating any permanent production objects. This was great because I did find errors with my logic, but I was able to fix them and retry my tests over and over again until everything ran without issues.

Testing in Production

Ideally I wouldn't have to use this solution. It would have been much better to have data to test with in a non-production environment. Using a temporary stored procedure to test in production is a hack to get around environment restrictions.

However, what is ideal and what is real-world doesn't always align. Sometimes a hack helps meet deadlines when better options aren't available. In this instance, temporary stored procedures helped make testing a breeze.