Here's a Quick Way To Generate a Running Total in SQL Server

da24c-105adqqwalb9gszttpjw23q

Watch this week's video on YouTube

Historically it's been difficult to accomplish certain tasks in SQL Server.

Probably the most annoying problem I had to do regularly before SQL Server 2012 was to generate a running total. How can a running total be so easy to do in Excel, but difficult to do in SQL?

SUM(), click, drag, done. Excel, you will always have a place in my heart.

Before SQL Server 2012, the solution to generating a running total involved cursors, CTEs, nested subqueries, or cross applies. This StackOverflow thread has a variety of solutions if you need to solve this problem in an older version of SQL Server.

However, SQL Server 2012's introduction of window functions makes creating a running total incredibly easy.

First, some test data:

CREATE TABLE dbo.Purchases
(
  CustomerID int,
  TransactionDate date,
  Price int
)
INSERT INTO dbo.Purchases VALUES (1,'2017-06-01',5)
INSERT INTO dbo.Purchases VALUES (1,'2017-06-15',8)
INSERT INTO dbo.Purchases VALUES (1,'2017-06-18',3)
INSERT INTO dbo.Purchases VALUES (1,'2017-06-30',6)
INSERT INTO dbo.Purchases VALUES (2,'2017-05-04',5)
INSERT INTO dbo.Purchases VALUES (2,'2017-06-04',5)
INSERT INTO dbo.Purchases VALUES (2,'2017-07-04',1)
INSERT INTO dbo.Purchases VALUES (3,'2017-05-01',2)
INSERT INTO dbo.Purchases VALUES (3,'2017-05-02',8)
INSERT INTO dbo.Purchases VALUES (3,'2017-05-03',9)
INSERT INTO dbo.Purchases VALUES (3,'2017-05-04',5)
INSERT INTO dbo.Purchases VALUES (3,'2017-05-05',4)
INSERT INTO dbo.Purchases VALUES (3,'2017-05-06',2)

6a0d6-1wnrnamwua7g-rnvwtlgyzw

Next, we write our query using the following window function OVER() syntax:

SELECT
  CustomerID,
  TransactionDate,
  Price,
  SUM(Price) OVER (PARTITION BY CustomerID ORDER BY TransactionDate) AS RunningTotal
FROM
  dbo.Purchases

The syntax for our OVER() clause is simple:

  • SUM(Price) specifies which column we are creating the running total on
  • PARTITION BY specifies around what group of data we want to create our "window" — each new window will reset the running total
  • ORDER BY specifies in what order the rows should be sorted before being summed

The results? An easy to write running total:

52df1-1_ulawysmk6gmrjzvuaijgq

My Most Embarrassing SQL Moment

T-SQL Tuesday #92: Lessons Learned the Hard Way

55ae8-1lh0mvkliatliiikt0vlyow

This post is a response to this month's T-SQL Tuesday prompt. T-SQL Tuesday was created by Adam Machanic and is a way for SQL users to share ideas about interesting topics. This month's topic is Lessons Learned the Hard Way.


Watch this week's video on YouTube

"Is this your query that's killing the server?"

It was my first week on the job and I was learning to query one of our major databases.

Up until that point, my SQL experience was limited to working on a *tiny* e-commerce database. Query performance was never something I had to deal with because any query I wrote, no matter how poorly written, would always execute quickly.

This new database I was working on though had tables with a billion+ rows. I should have been more conscious about how I was writing my joins and filtering my records, but I wasn't. I wrote my query and executed it in SQL Server Management Studio.

About 20 minutes into my query's execution, I received an email from my new DBA, and it looked something like this:

Uhh, there might be a problem here

"Is this your query that's killing the server?"

Oops.

I don't think my mouse ever moved to the stop execution button as quickly as it did that moment.

I was incredibly embarrassed to have brought our production server to a crawl. I was also incredibly embarrassed to have had my first interaction with my new DBA be about a query that created major problems for him.

Although there were no long-term damages from my server-crushing query, it was a scenario that I definitely didn't want to relive again in the future.

Next time: don't do that again

Obviously, this was an experience where I learned that maybe I shouldn't write queries against unfamiliar data in production.

  • I should have been practicing on a dev database.
  • I should have looked at table meta data and made sure I understood relationships between tables better.
  • I should have done some more preliminary querying with more restrictive filters to be able to catch performance problems earlier on with smaller result sets.
  • I should have examined what indexes were available and made sure I was attempting to use them.
  • I should have used a (NOLOCK) if I absolutely had to test on the production data so that at the very least I wouldn't prevent the high transaction ETLs from modifying data in that database.

All of those "should haves" quickly became my checklist for what to do before running any query in an unfamiliar environment.

I've still written plenty of ugly and inefficient queries since then, however none of them ever caused me to bring the SQL server to a halt like I did in my first week. That was one lesson that I learned the hard way.

How to Automatically Purge Historical Data From a Temporal Table

Watch this week's video on YouTube

Temporal Tables are awesome.

They make analyzing time-series data a cinch, and because they automatically track row-level history, rolling-back from an "oops" scenario doesn't mean you have to pull out the database backups.

The problem with temporal tables is that they produce a lot of data. Every row-level change stored in the temporal table's history table quickly adds up, increasing the possibility that a low-disk space warning is going to be sent to the DBA on-call.

In the future with SQL Server 2017 CTP3, Microsoft allows us to add a retention period to our temporal tables, making purging old data in a temporal table as easy as specifying:

ALTER DATABASE DatabaseName
SET TEMPORAL_HISTORY_RETENTION ON

CREATE TABLE dbo.TableName (
...
) 
WITH
( 
    SYSTEM_VERSIONING = ON      
    (
        HISTORY_TABLE = dbo.TableNameHistory,            
        HISTORY_RETENTION_PERIOD = 6 MONTHS      
    )  
);

However, until we are all on 2017 in production, we have to manually automate the process with a few scripts.

Purging old data out of history tables in SQL Server 2016

In the next few steps we are going to write a script that deletes data more than a month old from my CarInventoryHistory table:

SELECT * FROM dbo.CarInventory;
SELECT * FROM dbo.CarInventoryHistory;

04cd5-1kjmrbwqd96geq1ypp5ejaa

And now if we write our DELETE statement:

ALTER TABLE dbo.CarInventory SET ( SYSTEM_VERSIONING = OFF  ) ;
GO

-- In the real-world we would do some DATE math here
DECLARE @OneMonthBack DATETIME2 = '2017-06-04';

DELETE FROM dbo.CarInventoryHistory WHERE SysStartTime < @OneMonthBack;

You'll notice that we first had to turn system versioning off: SQL Server won't let us delete data from a history table that is currently tracking a temporal table.

This is a poor solution however. Although the data will delete correctly from our history table, we open ourselves up to data integrity issues. If another process INSERTs/UPDATEs/DELETEs into our temporal table while the history deletion is occurring, those new INSERTs/UPDATEs/DELETEs won't be tracked because system versioning is turned off.

The better solution is to wrap our ALTER TABLE/DELETE logic in a transaction so any other queries running against our temporal table will have to wait:

-- Run this in query window #1 (delete data):
BEGIN TRANSACTION;
ALTER TABLE dbo.CarInventory SET ( SYSTEM_VERSIONING = OFF );
GO

-- In the real-world we would do some DATE math here
DECLARE @OneMonthBack DATETIME2 = '2017-06-04';

DELETE FROM dbo.CarInventoryHistory WITH (TABLOCKX)
WHERE SysStartTime < @OneMonthBack;

-- Let's wait for 10 seconds to mimic a longer delete operation
WAITFOR DELAY '00:00:10';

--Re-enable our SYSTEM_VERSIONING
ALTER TABLE dbo.CarInventory SET ( SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.CarInventoryHistory));
GO

COMMIT TRANSACTION;

-- Run this in query window #2 during the same time as the above query (trying to update during deletion):
UPDATE dbo.CarInventory SET InLot = 0 WHERE CarId = 4;

And the result? Our history table data was deleted while still tracking the row-level data changes to our temporal table:

60a83-1sx-gxdjlwhjanonw622vnw

All that is left to do is to throw this script into a SQL Agent job and schedule how often you want it to run.

How to Put SQL Column Names Onto Multiple Lines in SSMS

A few keystrokes and BAM! A mostly formatted query

SQL in 60 Seconds is a series where I share SQL tips and tricks that you can learn and start using in less than a minute.

Watch this week's video on YouTube

Have you ever copied and pasted a query into SQL Server Management Studio and been annoyed that the list of column names in the SELECT statement were all on one line?

There are 30 columns here. Ugh.

SELECT Col1, Col2, Col3,  Col4, Col5,Col6,Col7, Col8, Col9, Col10,Col11,Col12,Col13,Col14,Col15,Col16,Col17,Col18,Col19,Col20,Col21,Col22,Col23,Col24,Col25,Col26,Col27,Col28,Col29,Col30 FROM dbo.MyTable

You can make the query easier to read by putting each column name onto its own line.

Simply open the Find and Replace window (CTRL + H) and type in ,(:Wh)* for the Find value and ,nt for the Replace value (in some versions of SSMS you may have better luck using ,(:Wh|t| )* in the Find field). Make sure "Use Regular Expressions" is checked and press Replace All:

Make sure the regular expression icon/box is checked

A few keystrokes and BAM! A mostly formatted query

The magic you just used is a Regular Expression, and Microsoft has its own flavor used in SSMS and Visual Studio. Basically, we found text that

  • began with a comma (,)
  • followed by any whitespace (:Wh) (line break, tab, space, etc…)
  • (in newer versions of SSMS we add |t| to indicate or tab or space)
  • and replaced it with a comma (,) and a new line (n) and tab (t).

Sure, this trick isn't going to give you the same output as if you used a proper SQL formatter, but this technique is free and built straight into SSMS.

5 Business Problems You Can Solve With Temporal Tables

3404b-14frjyurrbemzjlzyfhtz7w

Watch this week's video on YouTube

It's 4:30 pm on Friday and Mr. Manager comes along to tell you that he needs you to run some important ad-hoc analysis for him.

Previously this meant having to stay late at the office, writing cumbersome queries to extract business information from transactional data.

Lucky for you, you've recently started using Temporal Tables in SQL Server ensuring that you'll be able to answer your boss's questions and still make it to happy hour for \$3 margaritas.

Sound like a plan? Keep reading below!

The Data

For these demos, we'll be using my imaginary car rental business data. It consists of our temporal table dbo.CarInventory and our history table dbo.CarInventoryHistory:

I've upgraded my business — we now have FOUR Chevy Malibus available for you to rent

Business Problem #1 — "Get me current inventory!"

To get our current inventory of rental cars, all we have to do is query the temporal table:

SELECT * FROM dbo.CarInventory

That's it.

I know this query seems lame — it's just a SELECT FROM statement. There are no FOR SYSTEM TIME clauses, WHERE statements, and no other interesting T-SQL features.

But that's the point! Have you ever had to get the "current" rows out of a table that is keeping track of all transactions? I'm sure it involved some GROUP BY statements, some window functions, and more than a few cups of coffee.

Temporal tables automatically manage your transaction history, providing the most current records in one table (dbo.CarInventory) and all of the historical transactions in another (dbo.CarInventoryHistory). No need for complicated queries.

Business Problem #2 — "How many miles on average do our customers drive each of our cars?"

In this example, we use FOR SYSTEM_TIME ALL and a plain old GROUP BY to get the data we need:

  SELECT
    CarId, AVG(Mileage) AS AverageMileage
  FROM
    dbo.CarInventory FOR SYSTEM_TIME ALL
  WHERE
    InLot = 1 -- The car has been successfully returned to our lot
    AND SysStartTime > '2017-05-13 08:00:00.0000000' -- Ignore our initial car purchase
  GROUP BY
    CarId

Some cars get driven a lot more. Causation or correlation?

FOR SYSTEM_TIME ALL returns all rows from both the temporal and history table. It's equivalent to:

SELECT * FROM dbo.CarInventory 
UNION ALL 
SELECT * FROM dbo.CarInventoryHistory

Once again, there isn't anything too fancy going on here — but that's the point. With temporal tables, your data is organized to make analysis easier.

Business Problem #3 — "How many cars do we rent out week over week?"

Here at Wagner Car Rentals we want to figure out how often our cars are being rented and see how those numbers change from week to week.

SELECT
  CurrentWeek.CarId,
  CurrentWeek.RentalCount AS CurrentRentalCount,
  PreviousWeek.RentalCount AS PreviousRentalCount
FROM
  (
  SELECT
    CarId,
    COUNT(*) AS RentalCount
  FROM
    dbo.CarInventory FOR SYSTEM_TIME FROM '2017-06-05' TO '2017-06-12'
  WHERE
    InLot = 0 -- Car is out with the customer
  GROUP BY
    CarId
  ) CurrentWeek
  FULL JOIN
  (
  SELECT
    CarId,
    COUNT(*) AS RentalCount
  FROM
    dbo.CarInventory FOR SYSTEM_TIME FROM '2017-05-29' TO '2017-06-05'
  WHERE
    InLot = 0 -- Car is out with the customer
  GROUP BY
    CarId
  ) PreviousWeek
  ON CurrentWeek.CarId = PreviousWeek.CarId

0771c-1jwllidj567utcv3bgdsjow

In this query, we are using FOR SYSTEM_TIME FOR/TO on our temporal table to specify what data we want in our "CurrentWeek" and "PreviousWeek" subqueries.

FOR/TOreturns any records that were active during the specified range(BETWEEN/AND does the same thing, but its upper bound datetime2 value is inclusive instead of exclusive).

Business Problem #4 — "What color cars are rented most often?"

We're thinking of expanding our fleet of rental vehicles and want to purchase cars in the most popular colors so we can keep customers happy (and get more of their business!). How can we tell which color cars get rented most often?

SELECT 
  CarId, 
  Color, 
  COUNT(*)/2 AS RentalCount -- Divide by 2 because transactions are double counted (rental and return dates)
FROM 
  dbo.CarInventory FOR SYSTEM_TIME CONTAINED IN ('2017-05-15','2017-06-15')
GROUP BY 
  CarId,
  Color

Here we use CONTAINED IN because we want to get precise counts of how many cars were rented and returned in a specific date range (if a car wasn't returned — stolen, wrecked and totaled, etc… — we don't want to purchase more of those colors in the future).

67c8c-1xtqczezvl0y_s6hxzhpi0g

Business Problem #5 — "Jerry broke it. FIX IT!"

The computer systems that we use at Wagner Car Rentals are a little…dated.

Instead of scanning a bar code to return a car back into our system, our employees need to manually type in the car details. The problem here is that some employees (like Jerry) can't type, and often makes typos:

SELECT * FROM dbo.CarInventory FOR SYSTEM_TIME ALL WHERE CarId = 4

c0dd3-1nxqy43njb7r63theaq0kja

Having inconsistent data makes our reporting much more difficult, but fortunately since we have our temporal table tracking row-level history, we can easily correct Jerry's typos by pulling the correct values from a previous record:

;WITH InventoryHistory  
AS  
(   
  SELECT ROW_NUMBER () OVER (PARTITION BY CarId ORDER BY SysStartTime DESC) AS RN, *  
  FROM dbo.CarInventory FOR SYSTEM_TIME ALL WHERE CarId = 4  
)  
--SELECT * FROM InventoryHistory
/*Update current row by using N-th row version from history (default is 1 - i.e. last version)*/  
UPDATE dbo.CarInventory   
  SET Color = h.Color  
  FROM 
    dbo.CarInventory i 
    INNER JOIN InventoryHistory h 
        ON i.CarId = h.CarId 
        AND RN = 2

Typos fixed!

Although we could have fixed this issue without using a temporal table, it shows how having all of the row-level transaction history makes it possible to repair incorrect data in more difficult scenarios. For even hairier situations, you can even roll-back your temporal table data.

Conclusion

Temporal tables are easy to setup and make writing analytical queries a cinch.

Hopefully writing queries against temporal tables will prevent you from having to stay late in the office the next time your manager asks you to run some ad-hoc analysis.