SQL Server's "Wrong" Math

Published Tue 24 September 2019 in SQL > Development

A couple of weeks ago I decided to rebuild my recording studio by getting rid of my fabric backdrop and replacing it with a true wall instead. Doing this would allow me more flexibility when shooting, further improving my filming process efficiency.

To determine how much lumber I would need for building the new walls, I decided to write a SQL query to help with my framing calculations. I was building a 6 foot wall and wanted to put a stud every 16 inches. Easy enough to do the math on this:

SELECT (6*12)/16

2019-09-23-19-41-32

The output of the query above was 4, indicating the number of studs I would need for one wall section.

What's interesting is that if we do this same equation in a calculator, we get a slightly different answer: 4.5.

$sql-math$

And while I didn't end up framing my walls incorrectly, if I trusted the output of my query I would have had some incorrectly sized walls.

Is SQL Server Bad At Simple Math?

What happened? Well it all has to do with how SQL Server handles calculations.

While tweeting about my studio rebuild process, Kenneth Fisher from SQL Studies tweeted about what I would learn about SQL Server from rebuilding the studio.

https://twitter.com/sqlstudent144/status/1170376006370283520

Jokingly, I tweeted back the above SELECT (6*12)/16 example because it is funny (scary?) how SQL Server chops off the .5 if you don't understand what's going on.

When you perform calculations in SQL Server, it converts any expressions to the datatype that has the highest precedence. In the above example, since all of the numbers we are dealing with are integers, SQL Server keeps the final answer as an integer, apparently not caring what should have come after the decimal.

The quick and dirty way to solve this is to include a datatype in the equation that allows for decimals and has a higher precedence than integer. Basically, convert one of the integers to a numeric by adding .0 to any of the values:

SELECT (6*12)/16.0

2019-09-23-19-53-29

This will then return the expected result.

Following up on Twitter, Andy Mallon mentions that you don't even need the 0, simply adding . will suffice:

https://twitter.com/AMtwo/status/1170391334500388865

SELECT * (6*12)/16.

Pat Phelan then took it a step further, saying you can use the e syntax if you want to get the same successful result but confuse your users:

https://twitter.com/YetAnotherSQL/status/1170400514716164096

SELECT * (6*12)/16e0

Out of all of these methods, I prefer adding the .0 because it is the least ambiguous. For calculations that matter however, I also like to throw a CAST around individual values or the entire equation to be certain that I am getting a result with the precision and scale that I expect instead of letting SQL Server automatically guess for me:

SELECT CAST((6*12)/16.0 AS NUMERIC (2,1))

Yes, it's a few extra characters, but the intent is clear.

Why Does All of This Matter?

Implicit conversions and datatype precedence are something that most people starting with SQL Server are not aware of until they discover that their results are "wrong". If your queries require precise answers, then you have to be precise and explicit in how you handle the data (otherwise you might build a studio wall incorrectly!)

For more information about all these types of conversions, check out Andy's post on the subject which has even more fun examples.

Predicate Execution Order on Mixed Data Type Columns

Published Tue 21 May 2019 in SQL > Development

Watch this week's video on YouTube

SQL Server's cost-based query optimizer does a pretty good job of figuring out what order to filter your data to get fast query executions. It considers things like index coverage, data distribution, and much more to decide how to retrieve your query's data.

However, these good intentions can become problematic in certain situations where you know more about your data than SQL Server does. When this happens, the order SQL Server chooses to execute predicates is important not just for performance of your query but for the business logic as well.

A Column With Mixed Data Types

Let's look at the following example table and data:

USE master;
DROP DATABASE IF EXISTS MixedDataTypes;
CREATE DATABASE MixedDatatypes;
USE MixedDatatypes;
GO

CREATE TABLE dbo.Pages
(
    Id int identity,
    PageName varchar(20),
    DataValue varchar(100),
    DataType varchar(20),
    CONSTRAINT PK_Id PRIMARY KEY (Id)
);
GO
INSERT INTO dbo.Pages VALUES ('StringsOnlyPage 1','abc','string')
GO 2000
INSERT INTO dbo.Pages VALUES ('NumbersOnlyPage 1','1.20','decimal')
GO 2000
INSERT INTO dbo.Pages VALUES ('NumbersOnlyPage 2','1.20','decimal')
GO 2000
INSERT INTO dbo.Pages VALUES ('MixedDataTypesPage 1','abc','string')
GO 1000
INSERT INTO dbo.Pages VALUES ('MixedDataTypesPage 1','1.20','decimal')
GO 1000

This table stores data for an application that has many different types of Pages. Each Page stores different types of data, but instead of creating a separate table for each type, we store all the different data in the varchar DataValue column and maintain the original data type in the DataType column.

This structure reduces the complexity required for maintaining our database (compared to creating possibly hundreds of tables, one for each PageName) and makes querying easier (only need to query one table). However, this design could also lead to some unexpected query results.

Filtering Mixed Data Values

Let's say we want to retrieve all data from one table with where the value is 1.2:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName = 'NumbersOnlyPage 1' AND DataValue = '1.2'

This query runs fine. The problem is since our original data type was a decimal with a value of 1.20, this string-based comparison doesn't work. What we really want to have happen is a numeric comparison in our predicate:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName = 'NumbersOnlyPage 1' AND DataValue = 1.2

2019-05-20-20-34-01

While the implicit conversion occurring on the table's DataValue column is not ideal, if the number of rows it needs to convert is small it's not so bad (plus, this isn't the point of today's post, so try and look past it for a few more moments).

Here comes the fun: what if we want to check all our Pages that contain numeric data for values of 1.2? We could write this query in a couple of different ways:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND DataValue = 1.2
--or
SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName in ('NumbersOnlyPage 1','NumbersOnlyPage 2') AND DataValue = 1.2

For both queries, we receive the error "Error converting data type varchar to numeric".

2019-05-20-20-34-40

Why? In this case SQL Server decides to do the implicit conversions on the DataValue column first before filtering on our PageName columns.

Up until this last query, SQL Server was deciding that it would be more efficient to filter the rows down to the specific Page first and then do the implicit conversions on the DataValue column. However, now that we are selecting more than one table, SQL Server says determines it has to scan everything anyway, it might as well do all of the implicit conversions first and filter on table names later.

The problem of course is that all our DataValue values are not numeric. In this case the order of the predicates does matter, not for performance but to be able to correctly execute the business logic that we defined as part of our query.

Not Good Solutions

One way we can fix this is to tempt SQL Server to filter on PageName first by adding an index:

CREATE NONCLUSTERED INDEX IX_PageName ON dbo.Pages (PageName) INCLUDE (DataValue);

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND DataValue = 1.2

This works great. SQL Server decides that since this index covers all the fields in our query, and because the index key is PageName, it will filter the rows on PageName first and perform the implicit conversions on the remaining rows.

The problem with this is that it's not guaranteed. Something may happen that will cause SQL Server not use this index in the future: our index doesn't cover our query anymore, we add some additional filtering, the index is removed so it can be replaced by a different index that will no longer be selected for this particular query, etc...

It just isn't a reliable option.

Plus it doesn't work in all scenarios. Let's say we parameterize the PageName and use the STRING_SPLIT() function to filter our Pages to only those passed in:

DECLARE @PageNames varchar(100) = 'NumbersOnlyPage 1,NumbersOnlyPage 2';
SELECT *
FROM dbo.Pages
WHERE PageName in (SELECT value FROM string_split(@PageNames,',')) AND DataValue = 1.2

We are back to square one since in this case STRING_SPLIT() needs to parse the PageName data first and then join it in with the rest of the data, causing our original failure scenario (this is the estimated execution plan):

2019-05-20-20-36-08

Other Options

So while indexing seems to fix the solution, it's not guaranteed to work 100% of the time.

Obviously we could not store data in this format, but that would add complexity to the database and app.

We could try to add the PageName filter into a derived table and force the join order, but that's ugly and will force us to read the table multiple times.

Since we also have data type information available for each row, we might consider utilizing that information:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND DataValue = 1.2 AND DataType = 'decimal'

But once again if this works it's through sheer luck.

TRY_CONVERT() is another option. This function returns nulls if it can't convert to a decimal:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND TRY_CONVERT(decimal(2,1),DataValue) = 1.2

This is actually a pretty good option since it's guaranteed to work regardless of which column SQL Server filters on first. If the number of DataValues you have to TRY and CONVERT is relatively small though, this may be your best choice.

For better performance, you can create a second column that contains data in decimal (or any other type) format:

ALTER TABLE dbo.Pages
ADD DataValueDecimal AS TRY_CONVERT(decimal(2,1),DataValue) PERSISTED

You could index both DataValue* columns and your performance would be pretty good. The downside here of course is that your app queries will have to change to match the new table structure:

SELECT PageName,DataValue
FROM dbo.Pages
WHERE PageName like 'NumbersOnlyPage%' AND CASE WHEN DataType = 'decimal' THEN DataValueDecimal ELSE DataValue END = 1.2

In conclusion, it's tough to say what the best option is for this type of scenario. However, it's important to keep in mind that if you decide to structure and write your queries in this format, you need to plan for order of operation issues and handle errors gracefully.

Data Type Precedence and Implicit Conversions

Published Tue 09 April 2019 in SQL > Development

Watch this week's video on YouTube

SQL Server needs to make sure data types match when performing operations that involve multiple pieces of data.

When the data types do not match, SQL Server has to implicitly convert the data before performing any operations.

While most of the time these implicit conversions go unnoticed, they are important to understand since they can lead to unexpected results.

When 4.4/.44 Doesn't Equal 10

Let's start with this example:

SELECT 4.4/CAST(.44 AS VARCHAR(5))

Ignoring for a moment that our denominator is of type VARCHAR, if we do some quick mental math or use a calculator, we can see that the answer should be 10:

Simple-division

However, if we look at the result SQL Server returns, it's strangely 11:

2019-04-08-16-24-28

To understand why this happens, we need to understand SQL Server's data type precedence logic.

Data Type Precedence

If we start with a simpler version of this example, we'll see SQL Server does in fact know how to perform math and return an answer of 10:

SELECT 4.4/.44

2019-04-08-16-27-16

We can use the SQL_VARIANT_PROPERTY() function to see what data types SQL Server is assuming we are using in our calculation:

SELECT 
    SQL_VARIANT_PROPERTY(4.4,'BaseType'),
    SQL_VARIANT_PROPERTY(.44,'BaseType'),
    SQL_VARIANT_PROPERTY(4.4/.44,'BaseType')

2019-04-08-16-27-58

In this case, since the data types of both components are numeric, SQL Server doesn't have to break a sweat making any drastic conversions to give us our expected answer. If we instead give it something a little more challenging:

SELECT 4.4/CAST(.44 AS FLOAT)

2019-04-08-16-28-41

Here we still return the result of 10, however SQL Server had to do a little more work. We know from the previous example that 4.4 on its own is of type numeric, and in this example we are explicitly casting .44 to a float. SQL Server isn't able to perform the division operation directly on two different data types, so it refers to its data type precedence table to decide which value to convert to a matching datatype.

In the linked table above, the float data type appears higher on the list than numeric (synonym: decimal) data type. This causes SQL Server to convert our numeric 4.4 to a float before dividing.

While SQL Server is doing extra work behind the scenes that we didn't explicitly request, we can't be too angry with it since it still is giving us the "correct" answer.

"Incorrect" Conversions

Let's look at something a little more dastardly:

SELECT CAST(4.4 AS NUMERIC)/CAST(.44 AS FLOAT)

You might think this should also return 10 based on the previous example, but in fact it returns 9.090909:

2019-04-08-16-30-02

While we are still witnessing implicit conversion here (the numeric gets converted to a float in order to allow SQL Server to perform the division), we are also experiencing a case of default data type precision and scale. If we use the SQL_VARIANT_PROPERTY() function again to not only reveal base type but also precision and scale, we'll notice that when we let SQL Server "guess" the scale, it correctly chooses 1 decimal place, while when we use the default scale associated with numeric we get 0:

SELECT 
    SQL_VARIANT_PROPERTY(4.4,'BaseType'),
    SQL_VARIANT_PROPERTY(4.4,'Precision'),
    SQL_VARIANT_PROPERTY(4.4,'Scale') 

SELECT 
    SQL_VARIANT_PROPERTY(CAST(4.4 AS NUMERIC),'BaseType'),
    SQL_VARIANT_PROPERTY(CAST(4.4 AS NUMERIC),'Precision'),
    SQL_VARIANT_PROPERTY(CAST(4.4 AS NUMERIC),'Scale')

2019-04-08-16-32-55

What this really means is that the decimal portion of 4.4 is getting chopped off, leaving us with an equation of 4 / .44 = 11.

Putting Everything Together

So back to our original example that returns 11:

SELECT 4.4/CAST(.44 AS VARCHAR(5))

What exactly is going on here? Well for starters, data type precedence is forcing SQL Server to convert the VARCHAR .44 to a numeric. But a numeric with what precision and scale?

SELECT 
    SQL_VARIANT_PROPERTY(4.4,'BaseType'),
    SQL_VARIANT_PROPERTY(4.4,'Precision'),
    SQL_VARIANT_PROPERTY(4.4,'Scale')

2019-04-08-16-33-43

Since SQL Server determines that our numerator is NUMERIC(2,1), it converts the denominator to a NUMERIC(2,1) as well. This means instead of dividing by .44, we end up dividing by .4 which results in 11:

-- An explicit version of the same calculation
SELECT CAST(4.4 AS NUMERIC(2,1))/CAST(.44 AS NUMERIC(2,1))

2019-04-08-16-34-30

SQL Server Isn't Wrong

While it's easy to blame SQL Server for not knowing how to perform simple mathematical operations, it's us the developers who are to blame. SQL Server is a piece of software following rules for how to handle unclear situations that we input into it.

Having SQL Server throw an error instead of assuming data types and implicitly converting data on our behalf would make things less ambiguous, but it would also make simple operations a lot more tedious (looking at you SSIS).

In reality, the best solution is to accurately define data types from initial creation and to always be explicit with data types when the value of your computations matters.

Unexpected Shorthand Date Implicit Conversions

Published Tue 23 October 2018 in SQL > Development > Dates

I do my best work in the mornings. Evenings are pretty good too once I get a second wind.

Late afternoon are my nemesis for getting any serious technical or creative work done. Usually I reserve that time for responding to emails, writing documentation, and brewing coffee.

Some afternoons I can't help myself though and end up getting myself into trouble.

What is THAT!?

It all started when I was troubleshooting an existing query that was having issues. During the process of trying to understand what the query was doing, I happened to look at the execution plan:

CREATE TABLE ##DatetimeTest ( SomeField varchar(50) NULL, CreateDatetime datetime);
INSERT INTO ##DatetimeTest VALUES ('asdf',GETDATE());

DECLARE @Today datetime = GETDATE();
SELECT DISTINCT
    *
FROM 
    ##DatetimeTest
WHERE
    CreateDatetime >= @Today-30;

Simplified, but you get the point.

When I hovered over the Table Scan, the Predicate section caught my eye. Specifically, I wondered "Why is 1900-01-31 showing up? I don't have that anywhere in my query!"

(fun side story: the real query I was working on was dealing with user-defined datatypes, something I don't have experience with, so I thought those had something to do with the problems I was experiencing. I latched onto this 1900 date as the potential cause but it ended up being a red herring!)

Help!

Like I mentioned, late afternoons are not when I do my best work. I couldn't figure out why SQL Server was converting my -30 to January 31, 1900.

Intrigued and having no clue what was going on, I decided to post the question with the #sqlhelp hashtag on Twitter. Fortunately for me, Aaron Bertrand, Jason Leiser, and Thomas LaRock all came to the rescue with ideas and answers - thanks guys!

Implicit Conversion

In hindsight, the answer is obvious: the -30 implicitly converts to a datetime (the return type of my @Today variable), in this case 30 days after the start of the minimum datetime value, 1900-01-01.

This makes perfect sense: SQL Server needs to do some math and in order to do so it first needs to make sure both datatypes in the equation match. Since int readily converts to datetime but not the other way around, SQL Server was just doing its job.

Future Problems

As I mentioned earlier, this int to datetime conversion wasn't the actual issue with my query - in my drowsy state I mistook it as being the source of my problem.

And while it wasn't a problem this time, it can become a problem in the future.

Aaron has an excellent article on the problems with shorthand date math, but the most relevant future issue with my query is: what if someone in the future decides to update all datetimes to datetime2s (datetime2 being Microsoft's recommended datatype for new work)?

If we update to a datetime2s and run the query again:

ALTER TABLE ##DatetimeTest
ALTER COLUMN CreateDatetime datetime2;

DECLARE @Today datetime2 = GETDATE();
SELECT DISTINCT
    *
FROM 
    ##DatetimeTest
WHERE
    CreateDatetime >= @Today-30;

Everything is broken

AHHH! While SQL Server had no problem converting our previous code between datetime and int, it's not so happy about converting datetime2.

Morals

In the end, the above scenario had nothing to do with the actual problem I had on hand (which had to do with some operator precedence confusion).

To avoid future confusion and problems it's still better to refactor the code to be explicit with what you want to do by using the DATEADD() function:

DECLARE @Today datetime = GETDATE();
SELECT DISTINCT
    *
FROM 
    ##DatetimeTest
WHERE
    CreateDatetime >= DATEADD(day, -30, @Today);

Pinal Dave Helps Me Fix My Performance Tuning Problems

Published Tue 24 July 2018 in SQL > Performance Tuning

Watch this week's video on YouTube

This week I was fortunate enough to film a video in collaboration with Pinal Dave, the SQL Authority himself. Pinal is creative, hilarious, and kind; making this video with him was A BLAST!

Although the video is a little tongue in cheek, Pinal's recommendations are very real: I've encountered plenty of scenarios where these solutions fixed slow queries. Will these recommendations fix the problem in every situation? Of course not, but they are a great place to start.

Instead of creating a text version of the concepts covered in the video (you should really watch it), I thought it would be fun to do a behind-the-scenes narrative of how the video came together because it is unlike any other project I've done before.

The Idea

After agreeing to make a video together, we tossed around a few ideas. Because we live in different time zones, we thought it would be a fun to do something where I kept waking Pinal up in the middle of the night.

We iterated over what SQL Server examples to use (originally the second example was going to show my queries running out of space because autogrowth being turned off). We also ended up adding another example after my wife suggested that having it build to three scenarios instead of two would be funnier - I agree!

Asynchronous Filming

You've probably already figured it out, but I didn't really wake Pinal up in the video (honestly, I think midnight would be too early to wake him up anyway; in our back and forth emails, I was seeing responses from him that were in the 1-2am range).

I filmed a preliminary version of my parts of the video, very roughly edited them together, and sent it over to Pinal.

He then filmed his segments, giving me lots of great footage (I'm not sure if it was ad-libbed or not, but I was dying of laughter when watching through his clips).

Then I re-filmed my parts to try to match his dialog as closely as possible. Re-filming my parts also allowed me to self-edit and not ramble as much.

Everything Else

After that, it was just the usual process of editing, color correction, audio processing, etc...

I'm happy with how it turned out, especially given all of the technical challenges we had with filming separately.

Major thanks again to Pinal for being supportive and willing to make a fun SQL Server video. Enjoy!