SQL Server’s “Wrong” Math

Published on: 2019-09-24

Watch this week’s video on YouTube!

A couple of weeks ago I decided to rebuild my recording studio by getting rid of my fabric backdrop and replacing it with a true wall instead. Doing this would allow me more flexibility when shooting, further improving my filming process efficiency.

To determine how much lumber I would need for building the new walls, I decided to write a SQL query to help with my framing calculations. I was building a 6 foot wall and wanted to put a stud every 16 inches. Easy enough to do the math on this:

SELECT (6*12)/16
The answer is 4

The output of the query above was 4, indicating the number of studs I would need for one wall section.

What’s interesting is that if we do this same equation in a calculator, we get a slightly different answer: 4.5.

And while I didn’t end up framing my walls incorrectly, if I trusted the output of my query I would have had some incorrectly sized walls.

Is SQL Server Bad At Simple Math?

What happened? Well it all has to do with how SQL Server handles calculations.

While tweeting about my studio rebuild processKenneth Fisher from SQL Studies tweeted about what I would learn about SQL Server from rebuilding the studio.

Jokingly, I tweeted back the above SELECT (6*12)/16 example because it is funny (scary?) how SQL Server chops off the .5 if you don’t understand what’s going on.

When you perform calculations in SQL Server, it converts any expressions to the datatype that has the highest precedence. In the above example, since all of the numbers we are dealing with are integers, SQL Server keeps the final answer as an integer, apparently not caring what should have come after the decimal.

The quick and dirty way to solve this is to include a datatype in the equation that allows for decimals and has a higher precedence than integer. Basically, convert one of the integers to a numeric by adding .0 to any of the values:

SELECT (6*12)/16.0
4.500000

This will then return the expected result.

Following up on Twitter, Andy Mallon mentions that you don’t even need the 0, simply adding . will suffice:

SELECT * (6*12)/16.

Pat Phelan then took it a step further, saying you can use the e syntax if you want to get the same successful result but confuse your users:

SELECT * (6*12)/16e0

Out of all of these methods, I prefer adding the .0 because it is the least ambiguous. For calculations that matter however, I also like to throw a CAST around individual values or the entire equation to be certain that I am getting a result with the precision and scale that I expect instead of letting SQL Server automatically guess for me:

SELECT CAST((6*12)/16.0 AS NUMERIC (2,1))

Yes, it’s a few extra characters, but the intent is clear.

Why Does All of This Matter?

Implicit conversions and datatype precedence are something that most people starting with SQL Server are not aware of until they discover that their results are “wrong”. If your queries require precise answers, then you have to be precise and explicit in how you handle the data (otherwise you might build a studio wall incorrectly!)

For more information about all these types of conversions, check out Andy’s post on the subject which has even more fun examples.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

SQL Server Quick Tips

Published on: 2019-09-17

Earlier this year I made a series of videos that were played before the start of several DBA Fundamentals presentations. These videos gave viewers something SQL related to watch while the online rooms were filling up with attendees before the start of each day’s live presentation.

Since these videos only ran once, and because my regular filming location is under construction this week, I decided to republish them here today.

Hope you enjoy these quick tips on several SQL Server concepts.

STATISTICS IO, TIME

Enabling STATISTICS IO and TIME can help you benchmark your queries to see if your tuning efforts are helping or hindering execution performance:

Watch “STATISTICS IO, TIME” on YouTube.

Key Concepts

The terminology surrounding keys in relational databases can be confusing – in this overview we summarize the difference between primary keys, foreign keys, natural keys, and more:

Watch “Key Concepts” on YouTube.

SARGability

When performance tuning queries, one of first things to check is whether your queries are SARGable and allow the optimizer to seek to the data it needs:

Watch “SARGability” on YouTube.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Tracking Who Last Changed a Row

Published on: 2019-09-10

This post is a response to this month’s T-SQL Tuesday #118 prompt by Kevin Chant.  T-SQL Tuesday is a way for the SQL Server community to share ideas about different database and professional topics every month.

This month’s topic asks about our fantasy SQL Server feature.


Watch this week’s episode on YouTube!

Data About The Data

Have you ever wondered who was the last person (or process) to modify a piece of data in your database?

SQL Server offers plenty of system views and functions that provide insight into how your server is running and how your queries are performing. However, it doesn’t offer much information about who last modified your data.

That’s why my ideal SQL Server feature would be to have a built-in way to audit who last modified a row of data and at what time.

Current Workarounds

Today, this sort of logging can be implemented in the application layer but that requires extra coding and time.

In SQL Server, temporal tables offer part of this functionality with their GENERATED ALWAYS FOR ROW START/END properties, but these only log a row created/row last modified date. There is no built-in way to log which user modified the data. The remaining temporal table functionality also adds unnecessary overhead if you don’t actually need to keep track of all of the data history.

Default constraints exist for helping insert default values for when a row was created and who the user was that created the row, but restricting unauthorized access to those fields as well as handling instances where data is updated is not as straight forward.

The closest thing to get this type of automatic logging in SQL Server today is to implement the functionality with triggers. Reinterpreting some of my code I wrote when discussing how to fake temporal tables with triggers, we come up with this:

DROP TABLE dbo.TestData;
CREATE TABLE dbo.TestData (
	Id int IDENTITY CONSTRAINT PK_Id PRIMARY KEY,
	ColA int,
	ColB int,
	ColC int,
	LastModifiedDate datetime2,
	LastModifiedUser nvarchar(30)
);
GO

CREATE TRIGGER AutoLogger ON dbo.TestData
AFTER INSERT,UPDATE
AS
BEGIN
SET NOCOUNT ON;

DECLARE 
	@CurrentDateTime datetime2 = SYSUTCDATETIME(),
	@CurrentUser nchar(30) = SYSTEM_USER

UPDATE t
SET
       LastModifiedDate = @CurrentDateTime,
	   LastModifiedUser = @CurrentUser
FROM
    dbo.TestData t
    INNER JOIN inserted i
        ON t.Id = i.Id;
END
GO

Now, any time a record is created or modified, we log the datetime and the user that performed the modification:

-- Regular insert
INSERT INTO dbo.TestData (ColA,ColB,ColC) VALUES (1,2,3);
-- Regular update
UPDATE dbo.Test SET ColA = 4 WHERE Id = 1;
SELECT * FROM dbo.TestData;

-- Questionable motive insert
INSERT INTO dbo.TestData (ColA,ColB,ColC,LastModifiedDate,LastModifiedUser) VALUES (1,2,3,'9999-12-31','NotMe');
-- Questionable motive update
UPDATE dbo.TestData SET LastModifiedDate = '9999-12-31', LastModifiedUser='NotMe' WHERE Id = 1;
SELECT * FROM dbo.TestData;
Logged dates and users

While this solution provides some of the functionality I want, it has many downsides. First, it utilizes a trigger which often gets overlooked (at least initially) when debugging issues, causing confusion and maintenance headaches.

Secondly, there is some overhead in having a trigger run after each and every insert and update. Transaction throughput gets limited since every INSERT/UPDATE on this table will trigger a follow up UPDATE.

Additionally, this solution is not automatic: it must be created individually on every table you want logging on.

Finally, this table now contains extra columns on the clustered index, columns that I don’t necessarily always want to be impacting my performance.

The Ideal Feature

I wish there was a database level option that allowed logging of who modified what row of data when. When turned on, it would automatically handle this logging logic (and maybe some more, like specifically what column in addition to what row was modified) without the need to set it up on individual tables or by using triggers.

Additionally, I would love if this data were not persisted on the table’s clustered index itself. If there were a way to store the data in a nonclustered index for that table only (kind of like a non-persisted computed column value gets stored) that would be ideal.

Finally, I would love if this meta data were populated asynchronously to not impact the performance of inserts/updates/deletes on the main table of data.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!