Data with Bert logo

SQL Server Quick Tips

Earlier this year I made a series of videos that were played before the start of several DBA Fundamentals presentations. These videos gave viewers something SQL related to watch while the online rooms were filling up with attendees before the start of each day's live presentation.

Since these videos only ran once, and because my regular filming location is under construction this week, I decided to republish them here today.

Hope you enjoy these quick tips on several SQL Server concepts.

STATISTICS IO, TIME

Enabling STATISTICS IO and TIME can help you benchmark your queries to see if your tuning efforts are helping or hindering execution performance:

Watch this video on YouTube

Key Concepts

The terminology surrounding keys in relational databases can be confusing - in this overview we summarize the difference between primary keys, foreign keys, natural keys, and more:

Watch this video on YouTube

SARGability

When performance tuning queries, one of first things to check is whether your queries are SARGable and allow the optimizer to seek to the data it needs:

Watch this video on YouTube

Tracking Who Last Changed a Row

MJ-t-sql-Tuesday

This post is a response to this month's T-SQL Tuesday #118 prompt by Kevin Chant.  T-SQL Tuesday is a way for the SQL Server community to share ideas about different database and professional topics every month.

This month's topic asks about our fantasy SQL Server feature.


Watch this week's video on YouTube

Data About The Data

Have you ever wondered who was the last person (or process) to modify a piece of data in your database?

SQL Server offers plenty of system views and functions that provide insight into how your server is running and how your queries are performing. However, it doesn't offer much information about who last modified your data.

That's why my ideal SQL Server feature would be to have a built-in way to audit who last modified a row of data and at what time.

Current Workarounds

Today, this sort of logging can be implemented in the application layer but that requires extra coding and time.

In SQL Server, temporal tables offer part of this functionality with their GENERATED ALWAYS FOR ROW START/END properties, but these only log a row created/row last modified date. There is no built-in way to log which user modified the data. The remaining temporal table functionality also adds unnecessary overhead if you don't actually need to keep track of all of the data history.

Default constraints exist for helping insert default values for when a row was created and who the user was that created the row, but restricting unauthorized access to those fields as well as handling instances where data is updated is not as straight forward.

The closest thing to get this type of automatic logging in SQL Server today is to implement the functionality with triggers. Reinterpreting some of my code I wrote when discussing how to fake temporal tables with triggers, we come up with this:

DROP TABLE dbo.TestData;
CREATE TABLE dbo.TestData (
    Id int IDENTITY CONSTRAINT PK_Id PRIMARY KEY,
    ColA int,
    ColB int,
    ColC int,
    LastModifiedDate datetime2,
    LastModifiedUser nvarchar(30)
);
GO

CREATE TRIGGER AutoLogger ON dbo.TestData
AFTER INSERT,UPDATE
AS
BEGIN
SET NOCOUNT ON;

DECLARE 
    @CurrentDateTime datetime2 = SYSUTCDATETIME(),
    @CurrentUser nchar(30) = SYSTEM_USER

UPDATE t
SET
       LastModifiedDate = @CurrentDateTime,
       LastModifiedUser = @CurrentUser
FROM
    dbo.TestData t
    INNER JOIN inserted i
        ON t.Id = i.Id;
END
GO

Now, any time a record is created or modified, we log the datetime and the user that performed the modification:

-- Regular insert
INSERT INTO dbo.TestData (ColA,ColB,ColC) VALUES (1,2,3);
-- Regular update
UPDATE dbo.Test SET ColA = 4 WHERE Id = 1;
SELECT * FROM dbo.TestData;

-- Questionable motive insert
INSERT INTO dbo.TestData (ColA,ColB,ColC,LastModifiedDate,LastModifiedUser) VALUES (1,2,3,'9999-12-31','NotMe');
-- Questionable motive update
UPDATE dbo.TestData SET LastModifiedDate = '9999-12-31', LastModifiedUser='NotMe' WHERE Id = 1;
SELECT * FROM dbo.TestData;

2019-09-09-19-41-49

While this solution provides some of the functionality I want, it has many downsides. First, it utilizes a trigger which often gets overlooked (at least initially) when debugging issues, causing confusion and maintenance headaches.

Secondly, there is some overhead in having a trigger run after each and every insert and update. Transaction throughput gets limited since every INSERT/UPDATE on this table will trigger a follow up UPDATE.

Additionally, this solution is not automatic: it must be created individually on every table you want logging on.

Finally, this table now contains extra columns on the clustered index, columns that I don't necessarily always want to be impacting my performance.

The Ideal Feature

I wish there was a database level option that allowed logging of who modified what row of data when. When turned on, it would automatically handle this logging logic (and maybe some more, like specifically what column in addition to what row was modified) without the need to set it up on individual tables or by using triggers.

Additionally, I would love if this data were not persisted on the table's clustered index itself. If there were a way to store the data in a nonclustered index for that table only (kind of like a non-persisted computed column value gets stored) that would be ideal.

Finally, I would love if this meta data were populated asynchronously to not impact the performance of inserts/updates/deletes on the main table of data.

SELECT Expression Execution Order

Watch this week's video on YouTube

Today I want to share with you an interesting observation I made about SELECT expression execution order.

I was working on writing a dynamic SQL query that would transform the following piece of JSON:

{"data":[["a","b","c"],["d","e","f"]]}

Into a query that looked like this:

SELECT 'a' AS Row0Column0, 'b' AS Row0Column1, 'c' AS Row0Column2 
UNION ALL
SELECT 'd' AS Row1Column0, 'e' AS Row1Column1, 'f' AS Row1Column2 

Normally I would use something like OPENJSON and PIVOT to transform the original data into a table result set, but in this instance I my requirements dictated that I needed to build the code as a series of SELECT and UNION ALL statements.

The first step in building this query was using OPENJSON to parse the JSON data into rows and value arrays:

SELECT
    *
FROM
(
SELECT
    rows.[key] AS RowNumber,
    rows.[Value] AS RowArray
FROM 
    OPENJSON(N'{"data":[["a","b","c"],["d","e","f"]]}','$.data') rows
) r
CROSS APPLY OPENJSON(r.RowArray) v
ORDER BY
    r.RowNumber,
    v.[key]

image

This first query was a good start. I then added a variable @RowQuery and started building my dynamic SQL code to generate my SELECT and UNION ALL statements:

DECLARE 
    @RowQuery varchar(max)

/* TOP is here to get the ORDER BY to work as expected */
SELECT TOP 134960239460263
    @RowQuery =  COALESCE (@RowQuery + ' ','') + '''' + v.[value] + ''' as [Row'+r.RowNumber+'Column'+v.[key]+']'
FROM
(
SELECT
    rows.[key] AS RowNumber,
    rows.[Value] AS RowArray
FROM 
    OPENJSON(N'{"data":[["a","b","c"],["d","e","f"]]}','$.data') rows
) r
CROSS APPLY OPENJSON(r.RowArray) v
ORDER BY
    r.RowNumber,
    v.[key]

PRINT 'SELECT ' + @RowQuery;

image-1

At this point I had the row/column numbering correct, but I still needed to add a UNION ALL SELECT before the start of each row.

I thought, "Oh, this is easy. Since the dynamic SQL I'm building is basically a loop, I need to check for a change in the RowNumber column's value to identify I'm on a new row. If I am, I can insert the UNION ALL SELECT text and I'll be all set":

DECLARE 
    @RowQuery varchar(max),
    @CurrentRow int = 0;

/* TOP is here to get the ORDER BY to work as expected */
SELECT TOP 134960239460263
    @RowQuery =  COALESCE (@RowQuery + '','') + IIF(r.RowNumber > @CurrentRow, CHAR(10)+'UNION ALL'+CHAR(10)+'SELECT ', ', ')+'''' + v.[value] + ''' as [Row'+r.RowNumber+'Column'+v.[key]+']',
    @CurrentRow = IIF(r.RowNumber > @CurrentRow, r.RowNumber, @CurrentRow)
FROM
(
SELECT
    rows.[key] AS RowNumber,
    rows.[Value] AS RowArray
FROM 
    OPENJSON(N'{"data":[["a","b","c"],["d","e","f"]]}','$.data') rows
) r
CROSS APPLY OPENJSON(r.RowArray) v
ORDER BY
    r.RowNumber,
    v.[key]

/*remove the first comma and add an initial SELECT */
PRINT STUFF(@RowQuery,1,1,'SELECT'); 

image-2

Success! But as I was celebrating my dynamic SQL victory, I realized I was making an assumption about SQL Server that I had never thought about before:

The above query only works because SQL Server is executing the variables in the SELECT list sequentially. I'm incrementing @CurrentRow only after processing my @RowQuery variable, and this logic only works correctly if SQL Server executes the variable expressions in the order they appear in the SELECT list. If SQL Server was executing items in the SELECT list in reverse or random order, @CurrentRow could potentially get set BEFORE @RowQuery was evaluated, causing the logic of adding "UNION ALL SELECT" in the right location to fail.

This surprised me because I don't usually think about the column execution order of a query. Normally column expressions in the SELECT statement are independent of each other so the order that the columns are executed in doesn't really matter. But in this example, the column execution order does matter and it's reassuring to see SQL Server do what I assumed it was doing.

Now, I can't guarantee this always works. I tried but failed to think of a scenario where SQL Server wouldn't execute the columns in sequential order. While the query seemed to work as expected in all of the tests I ran, I'll leave this observation open ended in case anyone has ever encountered a scenario or has any ideas of when SQL Server doesn't process SELECT statement expressions in the order they are listed.

Multiple Identity Inserts

Watch this week's video on YouTube

This week I want to share something that surprised me about using SQL Server's SET IDENTITY_INSERT statement.

I started with two tables with identity columns defined:

CREATE TABLE dbo.[User]
(
    Id int identity,
    UserName varchar(40)
);
CREATE TABLE dbo.StupidQuestions
(
    Id bigint identity,
    UserId int,
    Question varchar(400)
);


INSERT INTO dbo.[User] (UserName) VALUES ('Jim');
INSERT INTO dbo.[User] (UserName) VALUES ('Jane');
INSERT INTO dbo.[User] (UserName) VALUES ('Jin');
INSERT INTO dbo.[User] (UserName) VALUES ('Joyce');

INSERT INTO dbo.StupidQuestions (UserId,Question) VALUES (1,'Is smooth peanut butter better than chunky?');
INSERT INTO dbo.StupidQuestions (UserId,Question) VALUES (1,'Do I really need to backup my production databases?');
INSERT INTO dbo.StupidQuestions (UserId,Question) VALUES (2,'How to grant developers SA access?');
INSERT INTO dbo.StupidQuestions (UserId,Question) VALUES (3,'I''m getting an error about not being able to add any more indexes to my table - how do I increase the limit?');
INSERT INTO dbo.StupidQuestions (UserId,Question) VALUES (4,'How can I include more than 32 columns in my index key?');
GO

I wanted to copy the data from these two tables into two other tables:

CREATE TABLE dbo.User_DEV
(
    Id int identity,
    UserName varchar(40)
);

CREATE TABLE dbo.StupidQuestions_DEV
(
    Id bigint identity,
    UserId int,
    Question varchar(400)
);

This would allow me to safely test some changes on these _DEV table copies without breaking my original tables.

The next step was to write a couple of INSERT INTO SELECT statements:

INSERT INTO dbo.User_DEV
SELECT Id,UserName FROM dbo.[User]

INSERT INTO dbo.StupidQuestions_DEV
SELECT Id,UserId,Question FROM dbo.StupidQuestions

And of course as soon as I executed them SQL Server threw an error stating that I can't INSERT data into tables containing identity columns without first enabling identity inserts:

An explicit value for the identity column in table 'dbo.User_DEV' can only be specified when a column list is used and IDENTITY_INSERT is ON.

Ok, simple enough to fix: we just need to do what the error message says and SET IDENTITY_INSERT ON for both tables:

SET IDENTITY_INSERT dbo.User_DEV ON;  
SET IDENTITY_INSERT dbo.StupidQuestions_DEV ON;  

And... it still didn't work:

IDENTITY_INSERT is already ON for table 'IdentityTest.dbo.User_DEV'. Cannot perform SET operation for table 'dbo.StupidQuestions_DEV'.

One at a time

Although I've probably moved data around like this hundreds (thousands?) of times before, I've never encountered this particular error.

Apparently SQL Server only allows one table to have the IDENTITY_INSERT property enabled at a time within each session. The solution therefore is straightforward: enable identity inserts and copy each table's data one at a time:

SET IDENTITY_INSERT dbo.User_DEV ON; 
INSERT INTO dbo.User_DEV (Id,UserName)
SELECT Id,UserName FROM dbo.[User];
SET IDENTITY_INSERT dbo.User_DEV OFF; 

SET IDENTITY_INSERT dbo.StupidQuestions_DEV ON;
INSERT INTO dbo.StupidQuestions_DEV (Id,UserId,Question)
SELECT Id,UserId,Question FROM dbo.StupidQuestions
SET IDENTITY_INSERT dbo.StupidQuestions_DEV OFF;

20/20

In hindsight, I think I've never encountered this error before because I normally use the the Export Data Wizard in SSMS or a dedicated SSIS package to move data around. Either of those options are typically easier than writing T-SQL to move data across servers or for repeatability for when I need to regularly refresh tables with test data.

However, when using either of those options I've never paid attention to the implementation details, causing me to assume I knew how SQL Server handles identity inserts.

Trailing Spaces in SQL Server

Watch this week's video on YouTube

A long time ago I built an application that captured user input. One feature of the application was to compare the user's input against a database of values.

The app performed this text comparison as part of a SQL Server stored procedure, allowing me to easily update the business logic in the future if necessary.

One day, I received an email from a user saying that the value they were typing in was matching with a database value that they knew shouldn't match. That is the day I discovered SQL Server's counter intuitive equality comparison when dealing with trailing space characters.

Padded white space

You are probably aware that the CHAR data type pads the value with spaces until the defined length is reached:

DECLARE @Value CHAR(10) = 'a'
SELECT
    @Value AS OriginalValue,
    LEN(@Value) AS StringLength,
    DATALENGTH(@Value) AS DataLength,
    CAST(@Value AS BINARY) AS StringToHex;

image

The LEN() function shows the number of characters in our string, while the DATALENGTH() function shows us the number of bytes used by that string.

In this case, DATALENGTH is equal to 10. This result is due to the padded spaces occurring after the character "a" in order to fill the defined CHAR length of 10. We can confirm this by converting the value to hexadecimal. We see the value 61 ("a" in hex) followed by nine "20" values (spaces).

If we change our variable's data type to VARCHAR, we'll see the value is no longer padded with spaces:

DECLARE @Value VARCHAR(10) = 'a'
SELECT
    @Value AS OriginalValue,
    LEN(@Value) AS StringLength,
    DATALENGTH(@Value) AS DataLength,
    CAST(@Value AS BINARY) AS StringToHex;

image-1

Given that one of these data types pads values with space characters while the other doesn't, what happens if we compare the two?

DECLARE 
    @CharValue CHAR(10) = '',
    @VarcharValue VARCHAR(10) = ''
SELECT
    IIF(@CharValue=@VarcharValue,1,0) AS ValuesAreEqual,
    DATALENGTH(@CharValue) AS CharBytes,
    DATALENGTH(@VarcharValue) AS VarcharBytes

image-2

In this case SQL Server considers both values equal, even though we can confirm that the DATALENGTHs are different.

This behavior doesn't only occur with mixed data type comparisons however. If we compare two values of the same data type, with one value containing several space characters, we experience something...unexpected:

DECLARE 
    @NoSpaceValue VARCHAR(10) = '',
    @MultiSpaceValue VARCHAR(10) = '    '
SELECT
    IIF(@NoSpaceValue=@MultiSpaceValue,1,0) AS ValuesAreEqual,
    DATALENGTH(@NoSpaceValue) AS NoSpaceBytes,
    DATALENGTH(@MultiSpaceValue) AS MultiSpaceBytes

image-3

Even though our two variables have different values (a blank compared to four space characters), SQL Server considers these values equal.

If we add a character with some trailing whitespace we'll see the same behavior:

DECLARE 
    @NoSpaceValue VARCHAR(10) = 'a',
    @MultiSpaceValue VARCHAR(10) = 'a     '
SELECT
    IIF(@NoSpaceValue=@MultiSpaceValue,1,0) AS ValuesAreEqual,
    DATALENGTH(@NoSpaceValue) AS NoSpaceBytes,
    DATALENGTH(@MultiSpaceValue) AS MultiSpaceBytes

image-4

Both values are clearly different, but SQL Server considers them to be equal to each other. Switching our equal sign to a LIKE operator changes things slightly:

DECLARE 
   @NoSpaceValue VARCHAR(10) = 'a',
   @MultiSpaceValue VARCHAR(10) = 'a     '
SELECT
   IIF(@NoSpaceValue LIKE @MultiSpaceValue,1,0) AS ValuesAreEqual,
   DATALENGTH(@NoSpaceValue) AS NoSpaceBytes,
   DATALENGTH(@MultiSpaceValue) AS MultiSpaceBytes

image-5

Even though I would think that a LIKE without any wildcard characters would behave just like an equal sign, SQL Server doesn't perform these comparisons the same way.

If we switch back to our equal sign comparison and prefix our character value with spaces we'll also notice a different result:

DECLARE 
    @NoSpaceValue VARCHAR(10) = 'a',
    @MultiSpaceValue VARCHAR(10) = '    a'
SELECT
    IIF(@NoSpaceValue=@MultiSpaceValue,1,0) AS ValuesAreEqual,
    DATALENGTH(@NoSpaceValue) AS NoSpaceBytes,
    DATALENGTH(@MultiSpaceValue) AS MultiSpaceBytes

image-6

SQL Server considers two values equal regardless of spaces occurring at the end of a string. Spaces preceding a string however, no longer considered a match.

What is going on?

ANSI

While counter intuitive, SQL Server's functionality is justified. SQL Server follows the ANSI specification for comparing strings, adding white space to strings so that they are the same length before comparing them. This explains the phenomena we are seeing.

It does not do this with the LIKE operator however, which explains the difference in behavior.

Comparisons when extra spaces matter

Let's say we want to do a comparison where the difference in trailing spaces matters.

One option is to use the LIKE operator as we saw a few examples back. This is not the typical use of the LIKE operator however, so be sure to comment and explain what your query is attempting to do by using it. The last thing you want is some future maintainer of your code to switch it back to an equal sign because they don't see any wild card characters.

Another option that I've seen is to perform a DATALENGTH comparison in addition to the value comparison:

DECLARE 
    @NoSpaceValue VARCHAR(10) = 'a',
    @MultiSpaceValue VARCHAR(10) = 'a    '
SELECT
    IIF(@NoSpaceValue = @MultiSpaceValue AND DATALENGTH(@NoSpaceValue) = DATALENGTH(@MultiSpaceValue),1,0) AS ValuesAreEqual,
    DATALENGTH(@NoSpaceValue) AS NoSpaceBytes,
    DATALENGTH(@MultiSpaceValue) AS MultiSpaceBytes

image-7

This solution isn't right for every scenario however. For starters, you have no way of knowing if SQL Server will execute your value comparison or DATALENGTH predicate first. This could wreck havoc on index usage and cause poor performance.

A more serious problem can occur if you are comparing fields with different data types. For example, when comparing a VARCHAR to NVARCHAR data type, it's pretty easy to create a scenario where your comparison query using DATALENGTH will trigger a false positive:

DECLARE 
    @NoSpaceValue VARCHAR(10) = 'a ',
    @MultiSpaceValue NVARCHAR(10) = 'a'
SELECT
    IIF(@NoSpaceValue = @MultiSpaceValue AND DATALENGTH(@NoSpaceValue) = DATALENGTH(@MultiSpaceValue),1,0) AS ValuesAreEqual,
    DATALENGTH(@NoSpaceValue) AS NoSpaceBytes,
    DATALENGTH(@MultiSpaceValue) AS MultiSpaceBytes

image-8

Here the NVARCHAR stores 2 bytes for every character, causing the DATALENGTHs of a single character NVARCHAR to be equal to a character + a space VARCHAR value.

The best thing to do in these scenarios is understand your data and pick a solution that will work for your particular situation.

And maybe trim your data before insertion (if it makes sense to do so)!