Tracking Who Last Changed a Row

Published on: 2019-09-10

This post is a response to this month’s T-SQL Tuesday #118 prompt by Kevin Chant.  T-SQL Tuesday is a way for the SQL Server community to share ideas about different database and professional topics every month.

This month’s topic asks about our fantasy SQL Server feature.

Watch this week’s episode on YouTube!

Data About The Data

Have you ever wondered who was the last person (or process) to modify a piece of data in your database?

SQL Server offers plenty of system views and functions that provide insight into how your server is running and how your queries are performing. However, it doesn’t offer much information about who last modified your data.

That’s why my ideal SQL Server feature would be to have a built-in way to audit who last modified a row of data and at what time.

Current Workarounds

Today, this sort of logging can be implemented in the application layer but that requires extra coding and time.

In SQL Server, temporal tables offer part of this functionality with their GENERATED ALWAYS FOR ROW START/END properties, but these only log a row created/row last modified date. There is no built-in way to log which user modified the data. The remaining temporal table functionality also adds unnecessary overhead if you don’t actually need to keep track of all of the data history.

Default constraints exist for helping insert default values for when a row was created and who the user was that created the row, but restricting unauthorized access to those fields as well as handling instances where data is updated is not as straight forward.

The closest thing to get this type of automatic logging in SQL Server today is to implement the functionality with triggers. Reinterpreting some of my code I wrote when discussing how to fake temporal tables with triggers, we come up with this:

DROP TABLE dbo.TestData;
CREATE TABLE dbo.TestData (
	ColA int,
	ColB int,
	ColC int,
	LastModifiedDate datetime2,
	LastModifiedUser nvarchar(30)

CREATE TRIGGER AutoLogger ON dbo.TestData

	@CurrentDateTime datetime2 = SYSUTCDATETIME(),
	@CurrentUser nchar(30) = SYSTEM_USER

       LastModifiedDate = @CurrentDateTime,
	   LastModifiedUser = @CurrentUser
    dbo.TestData t
    INNER JOIN inserted i
        ON t.Id = i.Id;

Now, any time a record is created or modified, we log the datetime and the user that performed the modification:

-- Regular insert
INSERT INTO dbo.TestData (ColA,ColB,ColC) VALUES (1,2,3);
-- Regular update
UPDATE dbo.Test SET ColA = 4 WHERE Id = 1;
SELECT * FROM dbo.TestData;

-- Questionable motive insert
INSERT INTO dbo.TestData (ColA,ColB,ColC,LastModifiedDate,LastModifiedUser) VALUES (1,2,3,'9999-12-31','NotMe');
-- Questionable motive update
UPDATE dbo.TestData SET LastModifiedDate = '9999-12-31', LastModifiedUser='NotMe' WHERE Id = 1;
SELECT * FROM dbo.TestData;
Logged dates and users

While this solution provides some of the functionality I want, it has many downsides. First, it utilizes a trigger which often gets overlooked (at least initially) when debugging issues, causing confusion and maintenance headaches.

Secondly, there is some overhead in having a trigger run after each and every insert and update. Transaction throughput gets limited since every INSERT/UPDATE on this table will trigger a follow up UPDATE.

Additionally, this solution is not automatic: it must be created individually on every table you want logging on.

Finally, this table now contains extra columns on the clustered index, columns that I don’t necessarily always want to be impacting my performance.

The Ideal Feature

I wish there was a database level option that allowed logging of who modified what row of data when. When turned on, it would automatically handle this logging logic (and maybe some more, like specifically what column in addition to what row was modified) without the need to set it up on individual tables or by using triggers.

Additionally, I would love if this data were not persisted on the table’s clustered index itself. If there were a way to store the data in a nonclustered index for that table only (kind of like a non-persisted computed column value gets stored) that would be ideal.

Finally, I would love if this meta data were populated asynchronously to not impact the performance of inserts/updates/deletes on the main table of data.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

7 thoughts on “Tracking Who Last Changed a Row”

  1. If applications use service accounts for access to data/stored procedures/… you will only see those account names when you refer to system users references. If you want to know who the logged in person at the keyboard was you’ll need to have the application collect and send.

    1. Was thinking the same thing. I work w/ SaaS DBs primarily – multi-tenant. One service account does the work, passes in the “user id” along the way. We can track that through columns on the table, but we can’t really use windows auth. Same for machine name – it all comes from the app server so we don’t know the actual person/machine.

  2. It’s even trickier, when app is using a service account and a bunch of stored procedures, that update,insert a particular table and you want to know, which SP it was. It’s hard to use XE, because you’d have to filter on sql_text, which is quite expensive. One could use DBCC INPUTBUFFER and a trigger, which unfortunately requires high permissions.

  3. Just to add what Bert stated about columns that keep track of when a row was last updated and by whom (I refer to these as “LMB” columns, which is short for “LastModifiedBy”).

    First, there actually is an open item on MS Azure Feedback on the subject. I up-voted it a while back. I encourage other folks to do the same. Here’s the link.

    Of course, as Bert also mentions, it would be nice if Temporal Tables had this feature built in, as well, so that we wouldn’t have to add the extra bytes to our Clustered Indexes, which inherently slows down code that does any kind of scan.

    For those that use LMB columns on a regular basis, be VERY aware that if the column defaults to NULL, then you have created a source of MASSIVE number of bad page splits and the MASSIVE amount of fragmentation and excess logfile generation and “unexplainable” blocking that goes with the bad page splits (it’s why they labeled as “bad”).

    The reason for the page split problem is simple. No matter what the Fill Factor is, newly inserted rows (with only 1 exception that I won’t go into here) will ALWAYS try to fill pages to as close to 100% full as the given row widths will allow. That means that your carefully planned Clustered Index based on an ever-increasing key will work really well with no bad page splits for all INSERTs. But even if all of the columns in the table are fixed-width in nature, there’s a really good change the LastModifiedBy (or whatever) column is based on a variable width datatype. Guess what happens when it goes from NULL to even 1 character in value? The row gets wider and it may no long fit on the extremely full page it lives on and ***WHAM***… bad page split and all the hell that it produces.

    And guess what else… this happens on the most active/recent data which is where people are concentrating their queries on and those split pages are only 50% (usually) or so full and THAT means that you’re also and actually wasting 50% of the memory that holds the table in the buffers.

    If you MUST use LMB columns, especially “LastModifiedBy”, you can do things to prevent the splitting problem and all that goes with it. Here’s a list of possibilities…
    1. Populate “LastModifiedBy” with the same data as the related “CreatedBy” column (which a lot of people include as part of their “LMB” column “auditing”.
    2. Keep a table with separate users in it with an IDENTITY column in it. Have your “LMB” triggers do a lookup on the table to get the value from the IDENTITY column and use THAT instead of the name. Of course, the “CreatedBy” (which doesn’t cause page splits because it’s never updated after the initial insert) and the “LastModifiedBy” columns should be converted to the INT datatype instead of VARCHAR or NVARCHAR.
    3. For certain large tables that suffer huge updates as part of an import or other processing batch routine, figure out what the most common or widest well used width of the “LastModifiedBy” entries are and default it to that number of spaces.
    4. Stop doing “poor man’s auditing” with “LMB” columns.

  4. Hey

    I might be wrong but did you do a video(or talked about) “Change Tracking” in sql server. I seem to remember in the video you would insert data and then update the data and you would see the difference with running a query.

    Did you do a video of this nature?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.