Tracking Who Last Changed a Row

Published on: 2019-09-10

This post is a response to this month’s T-SQL Tuesday #118 prompt by Kevin Chant.  T-SQL Tuesday is a way for the SQL Server community to share ideas about different database and professional topics every month.

This month’s topic asks about our fantasy SQL Server feature.


Watch this week’s episode on YouTube!

Data About The Data

Have you ever wondered who was the last person (or process) to modify a piece of data in your database?

SQL Server offers plenty of system views and functions that provide insight into how your server is running and how your queries are performing. However, it doesn’t offer much information about who last modified your data.

That’s why my ideal SQL Server feature would be to have a built-in way to audit who last modified a row of data and at what time.

Current Workarounds

Today, this sort of logging can be implemented in the application layer but that requires extra coding and time.

In SQL Server, temporal tables offer part of this functionality with their GENERATED ALWAYS FOR ROW START/END properties, but these only log a row created/row last modified date. There is no built-in way to log which user modified the data. The remaining temporal table functionality also adds unnecessary overhead if you don’t actually need to keep track of all of the data history.

Default constraints exist for helping insert default values for when a row was created and who the user was that created the row, but restricting unauthorized access to those fields as well as handling instances where data is updated is not as straight forward.

The closest thing to get this type of automatic logging in SQL Server today is to implement the functionality with triggers. Reinterpreting some of my code I wrote when discussing how to fake temporal tables with triggers, we come up with this:

DROP TABLE dbo.TestData;
CREATE TABLE dbo.TestData (
	Id int IDENTITY CONSTRAINT PK_Id PRIMARY KEY,
	ColA int,
	ColB int,
	ColC int,
	LastModifiedDate datetime2,
	LastModifiedUser nvarchar(30)
);
GO

CREATE TRIGGER AutoLogger ON dbo.TestData
AFTER INSERT,UPDATE
AS
BEGIN
SET NOCOUNT ON;

DECLARE 
	@CurrentDateTime datetime2 = SYSUTCDATETIME(),
	@CurrentUser nchar(30) = SYSTEM_USER

UPDATE t
SET
       LastModifiedDate = @CurrentDateTime,
	   LastModifiedUser = @CurrentUser
FROM
    dbo.TestData t
    INNER JOIN inserted i
        ON t.Id = i.Id;
END
GO

Now, any time a record is created or modified, we log the datetime and the user that performed the modification:

-- Regular insert
INSERT INTO dbo.TestData (ColA,ColB,ColC) VALUES (1,2,3);
-- Regular update
UPDATE dbo.Test SET ColA = 4 WHERE Id = 1;
SELECT * FROM dbo.TestData;

-- Questionable motive insert
INSERT INTO dbo.TestData (ColA,ColB,ColC,LastModifiedDate,LastModifiedUser) VALUES (1,2,3,'9999-12-31','NotMe');
-- Questionable motive update
UPDATE dbo.TestData SET LastModifiedDate = '9999-12-31', LastModifiedUser='NotMe' WHERE Id = 1;
SELECT * FROM dbo.TestData;
Logged dates and users

While this solution provides some of the functionality I want, it has many downsides. First, it utilizes a trigger which often gets overlooked (at least initially) when debugging issues, causing confusion and maintenance headaches.

Secondly, there is some overhead in having a trigger run after each and every insert and update. Transaction throughput gets limited since every INSERT/UPDATE on this table will trigger a follow up UPDATE.

Additionally, this solution is not automatic: it must be created individually on every table you want logging on.

Finally, this table now contains extra columns on the clustered index, columns that I don’t necessarily always want to be impacting my performance.

The Ideal Feature

I wish there was a database level option that allowed logging of who modified what row of data when. When turned on, it would automatically handle this logging logic (and maybe some more, like specifically what column in addition to what row was modified) without the need to set it up on individual tables or by using triggers.

Additionally, I would love if this data were not persisted on the table’s clustered index itself. If there were a way to store the data in a nonclustered index for that table only (kind of like a non-persisted computed column value gets stored) that would be ideal.

Finally, I would love if this meta data were populated asynchronously to not impact the performance of inserts/updates/deletes on the main table of data.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Whitespace, Letter Case, and Other Things That Prevent Plan Reuse

Published on: 2019-09-03

Watch this week’s episode on YouTube.

Last week’s post briefly mentioned that SQL server may not reuse a query plan from cache if there is a small difference in whitespace.

In addition to differences in whitespace, SQL Server won’t reuse query plans for a variety of reasons. Today’s post will cover some of the more common scenarios of when a plan won’t be reused.

Viewing the Cache

One way to view what’s in SQL Server’s query plan cache is by querying the sys.db_exec_query_stats DMV. Combined with the sys.dm_exec_sql_text and sys.dm_exec_query_plan functions, we can get an idea of what’s in the cache and how often it’s getting reused:

SELECT
	s.sql_handle,
	s.plan_generation_num,
	s.plan_handle,
	s.execution_count,
	s.query_hash,
	s.query_plan_hash,
	t.text,
	p.query_plan
FROM 
	sys.dm_exec_query_stats s
	CROSS APPLY sys.dm_exec_sql_text(sql_handle) t
	CROSS APPLY sys.dm_exec_query_plan(plan_handle) p

Now that we know how to view information about the query plan cache, let’s run DBCC FREEPROCCACHE to clear it out so we can start with a clean slate (warning: probably a bad idea to run this in production).

Patterns That Prevent Plan Reuse

With our plan cache empty, let’s run some similar queries and see if SQL Server decides to reuse plans, or insert nearly duplicate entries into the plan cache.

Whitespace

If we run these two queries:

SELECT TOP 100 DisplayName FROM dbo.Users;
GO
SELECT TOP 100 DisplayName FROM     dbo.Users;
GO

And then go check our query plan cache:

Two separate plans due to whitespace

We’ll notice that SQL Server compiled two separate plans for what is essentially the same exact query (the only difference being the extra spaces before the table name). What this means is that if your queries are not exactly the same, or if you are in the habit of highlighting queries in SSMS before you run them (and accidently select some extra white space before/after your query), SQL Server will generate a brand new plan for you even though it has a compiled plan for a nearly identical query already in cache.

Don’t think this can happen to you? While filming the video for this blog post, I accidently made this exact mistake even though I had reviewed my code and I was intentionally trying to show you what not to do. It’s easy to slip up. Check out the blooper at 2:25 if interested.

Letter Case

Differences in letter casing also prevent cached plan reuse:

SELECT TOP 100 DisplayName FROM dbo.Users;
GO
select TOP 100 DisplayName from dbo.Users;
GO
Different case

In this case, only our SELECT and FROM keywords have changed from upper to lower case, but in the eyes of SQL Server these queries are now different and it will generate plans individually for them.

Comments

Want to improve the clarity of your business logic with a comment? Great! But watch out if running the query with and without comments because SQL Server will consider these queries to be different:

SELECT TOP 100 DisplayName FROM dbo.Users;
GO
/* This query returns all user names */
SELECT TOP 100 DisplayName FROM dbo.Users;
GO
Comments

Schema Names

Running the following two queries:

SELECT TOP 100 DisplayName FROM dbo.Users;
GO
SELECT TOP 100 DisplayName FROM Users;
GO
Unspecified schemas

These two queries create two separate entries in the plan cache. The problem here is that every user can have a different default schema, so if you don’t prefix a table name with a schema, SQL Server won’t be able to guarantee that each execution of the query should run for the same exact table, forcing it to generate new plans and prevent reuse.

SET Options

If you have two queries running with two different configurations of SET options, the queries will also generate separate plans:

SET ANSI_PADDING ON;
GO
SELECT TOP 100 DisplayName FROM dbo.Users;
GO
SET ANSI_PADDING OFF;
GO
SELECT TOP 100 DisplayName FROM dbo.Users;
GO
Two plans for different set options

Simple Parameterization Datatypes

If a query is simple enough for SQL Server to parameterize it on its own, we’ll see the parameterize query usage count increase since SQL Server can reuse the query:

SELECT DisplayName FROM dbo.Users WHERE Id = 3;
GO
SELECT DisplayName FROM dbo.Users WHERE Id = 4;
GO
Two queries, one cached entry

However, you’ll notice that SQL Server defined the parameterized value as tinyint. If our query suddenly sends in a value larger than 255:

SELECT DisplayName FROM dbo.Users WHERE Id = 256;
GO
New @smallint parameter

We’ll see a new plan generated and stored in cache.

What is particularly interesting is that if we have differences in whitespace or comments in queries that can be simply parameterized, SQL Server will actually correct for this and be able to reuse the same plan even though queries without simple parameterization would fail to reuse the same plan:

SELECT DisplayName FROM dbo.Users WHERE Id = 3;
GO
SELECT DisplayName FROM dbo.Users WHERE Id =         3;
GO
SELECT DisplayName FROM dbo.Users WHERE             Id = 3;
GO
SELECT DisplayName /*comment*/ FROM dbo.Users WHERE             Id = 3;
GO
All four queries, one cached plan entry

Be Consistent

In conclusion, SQL Server is very particular about the differences in your submitted queries, no matter how small those differences may be. If you want SQL Server to be able to reuse plans from its cache, you need to make sure those queries are exactly the same.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

How I Troubleshoot SQL Server Execution Plans

Published on: 2019-08-27

Watch this week’s video on YouTube.

Today I’m concluding my series on SQL Server execution plans by sharing the specific steps I take when troubleshooting a slow performing query.

Miss the first 4 parts of this execution plans series? Check out part 1’s introduction to execution planspart 2’s overview of statistics, and part 3’s explanation of how to read an execution plan, and part 4’s look at commonly troublesome operators).

Getting the Execution Plan

In part 1 of this series we discussed the different types of execution plans available and how to view them. My preference is to start with the poor performing query and request an execution plan for it.

Split windows with estimated and actual execution plans

With the query pasted and formatted in my SSMS editor window, I like retrieving the estimated execution plan first, and then pasting the query into a second editor window and executing the query with the “Include Actual Execution Plan” option turned on. For bonus points, I’ll split the SSMS window vertically so I can start looking at the estimated execution plan while the query runs and returns the actual execution plan: I like this combination because I (almost) immediately receive my estimated execution plan and can start looking for problems. Once the query on the right finishes executing and I get the actual plan with all of its lovely run-time stats, I usually switch to that looking at that one.

Some people take the opposite approach, looking at a cached query plan first before running the query to get a plan. This is fine too, but from my experience running the query first gives me some other data points to work with: is this query still running slowly, or was it a temporary issue? Did the person I’m reviewing the query for make some mistake and saw poor performance for some other unrelated reason? Getting the execution plan myself will help answer some of these questions.

Live Query Statistics

But flaws exist with my process too. For example, running the query in an SSMS window may generate a new plan instead of reusing a cached plan (if something like white space is different between the two query texts). This is one of those things that is fine as long as you remember that’s what could be happening. It even acts as a benefit because if you do get different query plans between your run and what exists in cache, then you know you might have a parameter sniffing problem. Finally, if the actual execution plan is impractical to retrieve (the query takes too long to run), then I will turn on Live Query Statistics: I don’t start with this option because most queries I deal with will return by the time I finish looking over the estimated execution plan (and at that point the additional overhead of Live Query Statistics isn’t worth it). But if I don’t have time to wait for the actual plan to return, switching to Live Query Statistics and watching the long running query execute in real time will usually help diagnose the performance issue.

Unexpected Seeks vs Scans

Once I am viewing one of the execution plans above, one of the first things I check is where my data is being retrieved from.

If I see data coming primarily coming from nonclustered indexes, I feel pretty good that the data is being retrieved efficiently since only a limited number of columns are being returned and hopefully they are coming back in a beneficial sort order (this is going on the assumption that I keep my indexes pretty narrow to begin with).

If all I see are index scans, that may be fine, but I want to make sure:

  1. I am not seeing table scans – at the very least they should be clustered index scans,
  2. I am not using an unnecessary SELECT * in my query – why read in all of that extra data into memory or prevent a narrower index usage if I don’t need it,
  3. SQL Server isn’t scanning an entire index to return only a limit subset of rows

Finally, I like to double check any index seeks I see as part of key lookups. Once again, key lookups are not necessarily bad, but if I can include just one more column into a nonclustered index and get rid of the lookup, I may consider doing so if that part of the plan seems to be a bottleneck.

Inaccurate Row Estimates

Next I start looking at costly operators and checking their Actual Number of Rows vs Estimated Number of Rows values (for queries that don’t return quickly, Live Query Statistics helps identify these easily). I may also look at the relative size of arrows to see if it looks like one operator is returning or reading significantly more rows than I would expect.

If Actual vs Estimated Number of Rows are vastly different (generally, if Actual is greater than Estimated by more than 100x), I start thinking about why SQL Server might be estimating the wrong number of rows by using common sense or looking at the statistics being used.

At this point I also consider whether the query is parameterized (either explicitly in the query or automatically by SQL Server). If it is, I usually start going down the path of checking for parameter sniffing.

Suspicious Operators

Next I look for any other commonly problematic operators in my plan: things like sorts, spools, hash joins, etc…

I’ve already covered these in a previous post of my Execution Plans series, but it’s worth restating that I’m always keeping a look out for these operators.

Warnings

Finally, I quickly scan the plan for any yellow exclamation points present on any of the plan operators. These symbols indicate activities that SQL Server thinks it should warn us about. I also covered these in more detail in part 3, but it’s worth mentioning again here because looking for these warnings can be a huge help in identifying the troublesome parts of your execution plan.

Conclusion

There is no one right way to troubleshoot or performance tune a query. I use the above method because it makes sense for my world where most of the queries I performance tune are my own and I have a pretty good knowledge of what other types of queries are running on my database. I hope sharing this process can help you develop your own preferred process for query tuning in your own environment.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!