Multiple Identity Inserts

Watch this week's video on YouTube

This week I want to share something that surprised me about using SQL Server's SET IDENTITY_INSERT statement.

I started with two tables with identity columns defined:

CREATE TABLE dbo.[User]
(
    Id int identity,
    UserName varchar(40)
);
CREATE TABLE dbo.StupidQuestions
(
    Id bigint identity,
    UserId int,
    Question varchar(400)
);


INSERT INTO dbo.[User] (UserName) VALUES ('Jim');
INSERT INTO dbo.[User] (UserName) VALUES ('Jane');
INSERT INTO dbo.[User] (UserName) VALUES ('Jin');
INSERT INTO dbo.[User] (UserName) VALUES ('Joyce');

INSERT INTO dbo.StupidQuestions (UserId,Question) VALUES (1,'Is smooth peanut butter better than chunky?');
INSERT INTO dbo.StupidQuestions (UserId,Question) VALUES (1,'Do I really need to backup my production databases?');
INSERT INTO dbo.StupidQuestions (UserId,Question) VALUES (2,'How to grant developers SA access?');
INSERT INTO dbo.StupidQuestions (UserId,Question) VALUES (3,'I''m getting an error about not being able to add any more indexes to my table - how do I increase the limit?');
INSERT INTO dbo.StupidQuestions (UserId,Question) VALUES (4,'How can I include more than 32 columns in my index key?');
GO

I wanted to copy the data from these two tables into two other tables:

CREATE TABLE dbo.User_DEV
(
    Id int identity,
    UserName varchar(40)
);

CREATE TABLE dbo.StupidQuestions_DEV
(
    Id bigint identity,
    UserId int,
    Question varchar(400)
);

This would allow me to safely test some changes on these _DEV table copies without breaking my original tables.

The next step was to write a couple of INSERT INTO SELECT statements:

INSERT INTO dbo.User_DEV
SELECT Id,UserName FROM dbo.[User]

INSERT INTO dbo.StupidQuestions_DEV
SELECT Id,UserId,Question FROM dbo.StupidQuestions

And of course as soon as I executed them SQL Server threw an error stating that I can't INSERT data into tables containing identity columns without first enabling identity inserts:

An explicit value for the identity column in table 'dbo.User_DEV' can only be specified when a column list is used and IDENTITY_INSERT is ON.

Ok, simple enough to fix: we just need to do what the error message says and SET IDENTITY_INSERT ON for both tables:

SET IDENTITY_INSERT dbo.User_DEV ON;  
SET IDENTITY_INSERT dbo.StupidQuestions_DEV ON;  

And... it still didn't work:

IDENTITY_INSERT is already ON for table 'IdentityTest.dbo.User_DEV'. Cannot perform SET operation for table 'dbo.StupidQuestions_DEV'.

One at a time

Although I've probably moved data around like this hundreds (thousands?) of times before, I've never encountered this particular error.

Apparently SQL Server only allows one table to have the IDENTITY_INSERT property enabled at a time within each session. The solution therefore is straightforward: enable identity inserts and copy each table's data one at a time:

SET IDENTITY_INSERT dbo.User_DEV ON; 
INSERT INTO dbo.User_DEV (Id,UserName)
SELECT Id,UserName FROM dbo.[User];
SET IDENTITY_INSERT dbo.User_DEV OFF; 

SET IDENTITY_INSERT dbo.StupidQuestions_DEV ON;
INSERT INTO dbo.StupidQuestions_DEV (Id,UserId,Question)
SELECT Id,UserId,Question FROM dbo.StupidQuestions
SET IDENTITY_INSERT dbo.StupidQuestions_DEV OFF;

20/20

In hindsight, I think I've never encountered this error before because I normally use the the Export Data Wizard in SSMS or a dedicated SSIS package to move data around. Either of those options are typically easier than writing T-SQL to move data across servers or for repeatability for when I need to regularly refresh tables with test data.

However, when using either of those options I've never paid attention to the implementation details, causing me to assume I knew how SQL Server handles identity inserts.

Trailing Spaces in SQL Server

Watch this week's video on YouTube

A long time ago I built an application that captured user input. One feature of the application was to compare the user's input against a database of values.

The app performed this text comparison as part of a SQL Server stored procedure, allowing me to easily update the business logic in the future if necessary.

One day, I received an email from a user saying that the value they were typing in was matching with a database value that they knew shouldn't match. That is the day I discovered SQL Server's counter intuitive equality comparison when dealing with trailing space characters.

Padded white space

You are probably aware that the CHAR data type pads the value with spaces until the defined length is reached:

DECLARE @Value CHAR(10) = 'a'
SELECT
    @Value AS OriginalValue,
    LEN(@Value) AS StringLength,
    DATALENGTH(@Value) AS DataLength,
    CAST(@Value AS BINARY) AS StringToHex;

image

The LEN() function shows the number of characters in our string, while the DATALENGTH() function shows us the number of bytes used by that string.

In this case, DATALENGTH is equal to 10. This result is due to the padded spaces occurring after the character "a" in order to fill the defined CHAR length of 10. We can confirm this by converting the value to hexadecimal. We see the value 61 ("a" in hex) followed by nine "20" values (spaces).

If we change our variable's data type to VARCHAR, we'll see the value is no longer padded with spaces:

DECLARE @Value VARCHAR(10) = 'a'
SELECT
    @Value AS OriginalValue,
    LEN(@Value) AS StringLength,
    DATALENGTH(@Value) AS DataLength,
    CAST(@Value AS BINARY) AS StringToHex;

image-1

Given that one of these data types pads values with space characters while the other doesn't, what happens if we compare the two?

DECLARE 
    @CharValue CHAR(10) = '',
    @VarcharValue VARCHAR(10) = ''
SELECT
    IIF(@CharValue=@VarcharValue,1,0) AS ValuesAreEqual,
    DATALENGTH(@CharValue) AS CharBytes,
    DATALENGTH(@VarcharValue) AS VarcharBytes

image-2

In this case SQL Server considers both values equal, even though we can confirm that the DATALENGTHs are different.

This behavior doesn't only occur with mixed data type comparisons however. If we compare two values of the same data type, with one value containing several space characters, we experience something...unexpected:

DECLARE 
    @NoSpaceValue VARCHAR(10) = '',
    @MultiSpaceValue VARCHAR(10) = '    '
SELECT
    IIF(@NoSpaceValue=@MultiSpaceValue,1,0) AS ValuesAreEqual,
    DATALENGTH(@NoSpaceValue) AS NoSpaceBytes,
    DATALENGTH(@MultiSpaceValue) AS MultiSpaceBytes

image-3

Even though our two variables have different values (a blank compared to four space characters), SQL Server considers these values equal.

If we add a character with some trailing whitespace we'll see the same behavior:

DECLARE 
    @NoSpaceValue VARCHAR(10) = 'a',
    @MultiSpaceValue VARCHAR(10) = 'a     '
SELECT
    IIF(@NoSpaceValue=@MultiSpaceValue,1,0) AS ValuesAreEqual,
    DATALENGTH(@NoSpaceValue) AS NoSpaceBytes,
    DATALENGTH(@MultiSpaceValue) AS MultiSpaceBytes

image-4

Both values are clearly different, but SQL Server considers them to be equal to each other. Switching our equal sign to a LIKE operator changes things slightly:

DECLARE 
   @NoSpaceValue VARCHAR(10) = 'a',
   @MultiSpaceValue VARCHAR(10) = 'a     '
SELECT
   IIF(@NoSpaceValue LIKE @MultiSpaceValue,1,0) AS ValuesAreEqual,
   DATALENGTH(@NoSpaceValue) AS NoSpaceBytes,
   DATALENGTH(@MultiSpaceValue) AS MultiSpaceBytes

image-5

Even though I would think that a LIKE without any wildcard characters would behave just like an equal sign, SQL Server doesn't perform these comparisons the same way.

If we switch back to our equal sign comparison and prefix our character value with spaces we'll also notice a different result:

DECLARE 
    @NoSpaceValue VARCHAR(10) = 'a',
    @MultiSpaceValue VARCHAR(10) = '    a'
SELECT
    IIF(@NoSpaceValue=@MultiSpaceValue,1,0) AS ValuesAreEqual,
    DATALENGTH(@NoSpaceValue) AS NoSpaceBytes,
    DATALENGTH(@MultiSpaceValue) AS MultiSpaceBytes

image-6

SQL Server considers two values equal regardless of spaces occurring at the end of a string. Spaces preceding a string however, no longer considered a match.

What is going on?

ANSI

While counter intuitive, SQL Server's functionality is justified. SQL Server follows the ANSI specification for comparing strings, adding white space to strings so that they are the same length before comparing them. This explains the phenomena we are seeing.

It does not do this with the LIKE operator however, which explains the difference in behavior.

Comparisons when extra spaces matter

Let's say we want to do a comparison where the difference in trailing spaces matters.

One option is to use the LIKE operator as we saw a few examples back. This is not the typical use of the LIKE operator however, so be sure to comment and explain what your query is attempting to do by using it. The last thing you want is some future maintainer of your code to switch it back to an equal sign because they don't see any wild card characters.

Another option that I've seen is to perform a DATALENGTH comparison in addition to the value comparison:

DECLARE 
    @NoSpaceValue VARCHAR(10) = 'a',
    @MultiSpaceValue VARCHAR(10) = 'a    '
SELECT
    IIF(@NoSpaceValue = @MultiSpaceValue AND DATALENGTH(@NoSpaceValue) = DATALENGTH(@MultiSpaceValue),1,0) AS ValuesAreEqual,
    DATALENGTH(@NoSpaceValue) AS NoSpaceBytes,
    DATALENGTH(@MultiSpaceValue) AS MultiSpaceBytes

image-7

This solution isn't right for every scenario however. For starters, you have no way of knowing if SQL Server will execute your value comparison or DATALENGTH predicate first. This could wreck havoc on index usage and cause poor performance.

A more serious problem can occur if you are comparing fields with different data types. For example, when comparing a VARCHAR to NVARCHAR data type, it's pretty easy to create a scenario where your comparison query using DATALENGTH will trigger a false positive:

DECLARE 
    @NoSpaceValue VARCHAR(10) = 'a ',
    @MultiSpaceValue NVARCHAR(10) = 'a'
SELECT
    IIF(@NoSpaceValue = @MultiSpaceValue AND DATALENGTH(@NoSpaceValue) = DATALENGTH(@MultiSpaceValue),1,0) AS ValuesAreEqual,
    DATALENGTH(@NoSpaceValue) AS NoSpaceBytes,
    DATALENGTH(@MultiSpaceValue) AS MultiSpaceBytes

image-8

Here the NVARCHAR stores 2 bytes for every character, causing the DATALENGTHs of a single character NVARCHAR to be equal to a character + a space VARCHAR value.

The best thing to do in these scenarios is understand your data and pick a solution that will work for your particular situation.

And maybe trim your data before insertion (if it makes sense to do so)!

CHOOSE() in SQL Server

Watch this week's video on YouTube

While I know I don't utilize most of the features available in SQL Server, I like to think I'm at least aware that those features exist.

This week I found a blind-spot in my assumption however. Even though it shipped in SQL Server 2012, the SQL Server CHOOSE function is a feature that I think I'm seeing for the first time this past week.

CHOOSE is CASE

CHOOSE returns the n-th item from a comma-delimited list.

Whenever learning a new feature in SQL Server I try to think of a good demo I could build to test out the functionality. In this case the immediate example that came to mind was building something that would provide a lookup of values:

SELECT 
    [key],
    [value],
    [type],
    CHOOSE(type+1,'null','string','int','boolean','array','object') AS JsonType
FROM
    OPENJSON(N'{
        "Property1":null,
        "Property2":"a",
        "Property3":3,
        "Property4":false,
        "Property5":[1,2,"3"],
        "Property6":{
            "SubProperty1":"a"
        }
    }');

In this case, the OPENJSON function returns a "type" field that indicates the datatype of that particular JSON property's value. The issue is that the "type" column is numeric and I can never remember what type of data each number represents.

The above query solves this by using CHOOSE to create a lookup of values. Since OPENJSON returns results starting with 0, we need to use type+1 in order to get the 1-based CHOOSE function to work correctly:

2019-06-03-21-08-58

If we look at the CHOOSE function's scalar operator properties in the execution plan, we'll see that this function is just a fancy alias for a more verbose CASE statement:

[Expr1000] = Scalar Operator(
    CASE WHEN (CONVERT_IMPLICIT(int,OPENJSON_DEFAULT.[type],0)+(1))=(1) 
    THEN 'null' 
    ELSE 
        CASE WHEN (CONVERT_IMPLICIT(int,OPENJSON_DEFAULT.[type],0)+(1))=(2) 
        THEN 'string' 
        ELSE 
            CASE WHEN (CONVERT_IMPLICIT(int,OPENJSON_DEFAULT.[type],0)+(1))=(3) 
            THEN 'int' 
            ELSE 
                CASE WHEN (CONVERT_IMPLICIT(int,OPENJSON_DEFAULT.[type],0)+(1))=(4) 
                THEN 'boolean' 
                ELSE 
                    CASE WHEN (CONVERT_IMPLICIT(int,OPENJSON_DEFAULT.[type],0)+(1))=(5) 
                    THEN 'array' 
                    ELSE 
                        CASE WHEN (CONVERT_IMPLICIT(int,OPENJSON_DEFAULT.[type],0)+(1))=(6) 
                        THEN 'object' 
                        ELSE NULL END 
                    END 
                END 
            END 
        END 
    END
)

The Set-Based Way

I think one of the reasons I've never used CHOOSE is because I would hate typing up all of those lookup values and trapping them in a SELECT statement, never to be used again.

Previously, I would have stored the lookup values in table and joined them with the OPENJSON results to accomplish the same end result:

DROP TABLE IF EXISTS #JsonType;
CREATE TABLE #JsonType
(
    Id tinyint,
    JsonType varchar(20),
    CONSTRAINT PK_JsonTypeId PRIMARY KEY CLUSTERED (Id)
);

INSERT INTO #JsonType VALUES (0,'null');
INSERT INTO #JsonType VALUES (1,'string');
INSERT INTO #JsonType VALUES (2,'int');
INSERT INTO #JsonType VALUES (3,'boolean');
INSERT INTO #JsonType VALUES (4,'array');
INSERT INTO #JsonType VALUES (5,'object');

SELECT 
    j.[key],
    j.[value],
    j.[type],
    t.JsonType
FROM
    OPENJSON(N'{
        "Property1":null,
        "Property2":"a",
        "Property3":3,
        "Property4":false,
        "Property5":[1,2,"3"],
        "Property6":{
                        "SubProperty1":"a"
                    }
    }') j
    INNER JOIN #JsonType t
        ON j.[type] = t.Id

While more initial setup is involved with this solution, it's more flexible long-term. With a centralized set of values, there's no need to update the CHOOSE function in all of your queries when you can update the values in a single lookup table.

And while I didn't bother performance testing it, by virtue of being a scalar function, CHOOSE will probably perform worse in many real-world scenarios when compared to the table-based lookup approach (eg. large datasets, parallel plans, etc...).

CHOOSE What Works For You

I'm not surprised that it took me this long to learn about the CHOOSE function: while a simplified way to write certain CASE statements, I can't think of many (any?) scenarios where I would prefer to use it over a CASE or a lookup-table solution.

Predicate Execution Order on Mixed Data Type Columns

Watch this week's video on YouTube

SQL Server's cost-based query optimizer does a pretty good job of figuring out what order to filter your data to get fast query executions. It considers things like index coverage, data distribution, and much more to decide how to retrieve your query's data.

However, these good intentions can become problematic in certain situations where you know more about your data than SQL Server does. When this happens, the order SQL Server chooses to execute predicates is important not just for performance of your query but for the business logic as well.

A Column With Mixed Data Types

Let's look at the following example table and data:

USE master;
DROP DATABASE IF EXISTS MixedDataTypes;
CREATE DATABASE MixedDatatypes;
USE MixedDatatypes;
GO

CREATE TABLE dbo.Pages
(
    Id int identity,
    PageName varchar(20),
    DataValue varchar(100),
    DataType varchar(20),
    CONSTRAINT PK_Id PRIMARY KEY (Id)
);
GO
INSERT INTO dbo.Pages VALUES ('StringsOnlyPage 1','abc','string')
GO 2000
INSERT INTO dbo.Pages VALUES ('NumbersOnlyPage 1','1.20','decimal')
GO 2000
INSERT INTO dbo.Pages VALUES ('NumbersOnlyPage 2','1.20','decimal')
GO 2000
INSERT INTO dbo.Pages VALUES ('MixedDataTypesPage 1','abc','string')
GO 1000
INSERT INTO dbo.Pages VALUES ('MixedDataTypesPage 1','1.20','decimal')
GO 1000

This table stores data for an application that has many different types of Pages. Each Page stores different types of data, but instead of creating a separate table for each type, we store all the different data in the varchar DataValue column and maintain the original data type in the DataType column.

This structure reduces the complexity required for maintaining our database (compared to creating possibly hundreds of tables, one for each PageName) and makes querying easier (only need to query one table). However, this design could also lead to some unexpected query results.

Filtering Mixed Data Values

Let's say we want to retrieve all data from one table with where the value is 1.2:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName = 'NumbersOnlyPage 1' AND DataValue = '1.2'

This query runs fine. The problem is since our original data type was a decimal with a value of 1.20, this string-based comparison doesn't work. What we really want to have happen is a numeric comparison in our predicate:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName = 'NumbersOnlyPage 1' AND DataValue = 1.2

2019-05-20-20-34-01

While the implicit conversion occurring on the table's DataValue column is not ideal, if the number of rows it needs to convert is small it's not so bad (plus, this isn't the point of today's post, so try and look past it for a few more moments).

Here comes the fun: what if we want to check all our Pages that contain numeric data for values of 1.2? We could write this query in a couple of different ways:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND DataValue = 1.2
--or
SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName in ('NumbersOnlyPage 1','NumbersOnlyPage 2') AND DataValue = 1.2

For both queries, we receive the error "Error converting data type varchar to numeric".

2019-05-20-20-34-40

Why? In this case SQL Server decides to do the implicit conversions on the DataValue column first before filtering on our PageName columns.

Up until this last query, SQL Server was deciding that it would be more efficient to filter the rows down to the specific Page first and then do the implicit conversions on the DataValue column. However, now that we are selecting more than one table, SQL Server says determines it has to scan everything anyway, it might as well do all of the implicit conversions first and filter on table names later.

The problem of course is that all our DataValue values are not numeric. In this case the order of the predicates does matter, not for performance but to be able to correctly execute the business logic that we defined as part of our query.

Not Good Solutions

One way we can fix this is to tempt SQL Server to filter on PageName first by adding an index:

CREATE NONCLUSTERED INDEX IX_PageName ON dbo.Pages (PageName) INCLUDE (DataValue);

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND DataValue = 1.2

This works great. SQL Server decides that since this index covers all the fields in our query, and because the index key is PageName, it will filter the rows on PageName first and perform the implicit conversions on the remaining rows.

The problem with this is that it's not guaranteed. Something may happen that will cause SQL Server not use this index in the future: our index doesn't cover our query anymore, we add some additional filtering, the index is removed so it can be replaced by a different index that will no longer be selected for this particular query, etc...

It just isn't a reliable option.

Plus it doesn't work in all scenarios. Let's say we parameterize the PageName and use the STRING_SPLIT() function to filter our Pages to only those passed in:

DECLARE @PageNames varchar(100) = 'NumbersOnlyPage 1,NumbersOnlyPage 2';
SELECT *
FROM dbo.Pages
WHERE PageName in (SELECT value FROM string_split(@PageNames,',')) AND DataValue = 1.2

We are back to square one since in this case STRING_SPLIT() needs to parse the PageName data first and then join it in with the rest of the data, causing our original failure scenario (this is the estimated execution plan):

2019-05-20-20-36-08

Other Options

So while indexing seems to fix the solution, it's not guaranteed to work 100% of the time.

Obviously we could not store data in this format, but that would add complexity to the database and app.

We could try to add the PageName filter into a derived table and force the join order, but that's ugly and will force us to read the table multiple times.

Since we also have data type information available for each row, we might consider utilizing that information:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND DataValue = 1.2 AND DataType = 'decimal'

But once again if this works it's through sheer luck.

TRY_CONVERT() is another option. This function returns nulls if it can't convert to a decimal:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND TRY_CONVERT(decimal(2,1),DataValue) = 1.2 

This is actually a pretty good option since it's guaranteed to work regardless of which column SQL Server filters on first. If the number of DataValues you have to TRY and CONVERT is relatively small though, this may be your best choice.

For better performance, you can create a second column that contains data in decimal (or any other type) format:

ALTER TABLE dbo.Pages
ADD DataValueDecimal AS TRY_CONVERT(decimal(2,1),DataValue) PERSISTED

You could index both DataValue* columns and your performance would be pretty good. The downside here of course is that your app queries will have to change to match the new table structure:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND CASE WHEN DataType = 'decimal' THEN DataValueDecimal ELSE DataValue END = 1.2 

In conclusion, it's tough to say what the best option is for this type of scenario. However, it's important to keep in mind that if you decide to structure and write your queries in this format, you need to plan for order of operation issues and handle errors gracefully.

Data Type Precedence and Implicit Conversions

Watch this week's video on YouTube

SQL Server needs to make sure data types match when performing operations that involve multiple pieces of data.

When the data types do not match, SQL Server has to implicitly convert the data before performing any operations.

While most of the time these implicit conversions go unnoticed, they are important to understand since they can lead to unexpected results.

When 4.4/.44 Doesn't Equal 10

Let's start with this example:

SELECT 4.4/CAST(.44 AS VARCHAR(5))

Ignoring for a moment that our denominator is of type VARCHAR, if we do some quick mental math or use a calculator, we can see that the answer should be 10:

Simple-division

However, if we look at the result SQL Server returns, it's strangely 11:

2019-04-08-16-24-28

To understand why this happens, we need to understand SQL Server's data type precedence logic.

Data Type Precedence

If we start with a simpler version of this example, we'll see SQL Server does in fact know how to perform math and return an answer of 10:

SELECT 4.4/.44

2019-04-08-16-27-16

We can use the SQL_VARIANT_PROPERTY() function to see what data types SQL Server is assuming we are using in our calculation:

SELECT 
    SQL_VARIANT_PROPERTY(4.4,'BaseType'),
    SQL_VARIANT_PROPERTY(.44,'BaseType'),
    SQL_VARIANT_PROPERTY(4.4/.44,'BaseType')

2019-04-08-16-27-58

In this case, since the data types of both components are numeric, SQL Server doesn't have to break a sweat making any drastic conversions to give us our expected answer. If we instead give it something a little more challenging:

SELECT 4.4/CAST(.44 AS FLOAT)

2019-04-08-16-28-41

Here we still return the result of 10, however SQL Server had to do a little more work. We know from the previous example that 4.4 on its own is of type numeric, and in this example we are explicitly casting .44 to a float. SQL Server isn't able to perform the division operation directly on two different data types, so it refers to its data type precedence table to decide which value to convert to a matching datatype.

In the linked table above, the float data type appears higher on the list than numeric (synonym: decimal) data type. This causes SQL Server to convert our numeric 4.4 to a float before dividing.

While SQL Server is doing extra work behind the scenes that we didn't explicitly request, we can't be too angry with it since it still is giving us the "correct" answer.

"Incorrect" Conversions

Let's look at something a little more dastardly:

SELECT CAST(4.4 AS NUMERIC)/CAST(.44 AS FLOAT)

You might think this should also return 10 based on the previous example, but in fact it returns 9.090909:

2019-04-08-16-30-02

While we are still witnessing implicit conversion here (the numeric gets converted to a float in order to allow SQL Server to perform the division), we are also experiencing a case of default data type precision and scale. If we use the SQL_VARIANT_PROPERTY() function again to not only reveal base type but also precision and scale, we'll notice that when we let SQL Server "guess" the scale, it correctly chooses 1 decimal place, while when we use the default scale associated with numeric we get 0:

SELECT 
    SQL_VARIANT_PROPERTY(4.4,'BaseType'),
    SQL_VARIANT_PROPERTY(4.4,'Precision'),
    SQL_VARIANT_PROPERTY(4.4,'Scale') 

SELECT 
    SQL_VARIANT_PROPERTY(CAST(4.4 AS NUMERIC),'BaseType'),
    SQL_VARIANT_PROPERTY(CAST(4.4 AS NUMERIC),'Precision'),
    SQL_VARIANT_PROPERTY(CAST(4.4 AS NUMERIC),'Scale')

2019-04-08-16-32-55

What this really means is that the decimal portion of 4.4 is getting chopped off, leaving us with an equation of 4 / .44 = 11.

Putting Everything Together

So back to our original example that returns 11:

SELECT 4.4/CAST(.44 AS VARCHAR(5))

What exactly is going on here? Well for starters, data type precedence is forcing SQL Server to convert the VARCHAR .44 to a numeric. But a numeric with what precision and scale?

SELECT 
    SQL_VARIANT_PROPERTY(4.4,'BaseType'),
    SQL_VARIANT_PROPERTY(4.4,'Precision'),
    SQL_VARIANT_PROPERTY(4.4,'Scale') 

2019-04-08-16-33-43

Since SQL Server determines that our numerator is NUMERIC(2,1), it converts the denominator to a NUMERIC(2,1) as well. This means instead of dividing by .44, we end up dividing by .4 which results in 11:

-- An explicit version of the same calculation
SELECT CAST(4.4 AS NUMERIC(2,1))/CAST(.44 AS NUMERIC(2,1))

2019-04-08-16-34-30

SQL Server Isn't Wrong

While it's easy to blame SQL Server for not knowing how to perform simple mathematical operations, it's us the developers who are to blame. SQL Server is a piece of software following rules for how to handle unclear situations that we input into it.

Having SQL Server throw an error instead of assuming data types and implicitly converting data on our behalf would make things less ambiguous, but it would also make simple operations a lot more tedious (looking at you SSIS).

In reality, the best solution is to accurately define data types from initial creation and to always be explicit with data types when the value of your computations matters.

Page 1 / 3 »