Are your indexes being thwarted by mismatched datatypes?

Published Tue 01 August 2017 in SQL > Performance Tuning > Indexing

In this series I explore scenarios that hurt SQL Server performance and show you how to avoid them. Pulled from my collection of "things I didn't know I was doing wrong for years."

Watch this week's video on YouTube

Have you ever encountered a query that runs slowly, even though you've created indexes for it?

There's a few different reasons why this may happen. The one I see most frequently happens in the following scenario.

I'll have an espresso please

Let's say I have a table dbo.CoffeeInventory of coffee beans and prices that I pull from my favorite green coffee bean supplier each week. It looks something like this:

-- Make sure Actual Execution Plan is on
-- Let's see what our data looks like
SELECT * FROM dbo.CoffeeInventory

If you want to follow along, you can get this data set from this GitHub Gist

I want to be able to efficiently query this table and filter on price, so next I create an index like so:

CREATE CLUSTERED INDEX CL_Price ON dbo.CoffeeInventory (Price)

Now, I can write my query to find out what coffee prices are below my willingness to pay:

SELECT Name, Price FROM dbo.CoffeeInventory WHERE Price < 6.75

You would expect this query to be blazing fast and use a clustered index seek, right?

WRONG!

What the heck?

Why is SQL scanning the table when I added a clustered index on the column that I am filtering in my predicate? That's not how it's supposed to work!

Well dear reader, if we look a little bit closer at the table scan operation, we'll notice a little something called CONVERT_IMPLICIT:

CONVERT_IMPLICIT: ruiner of fast queries

What is CONVERT_IMPLICIT doing? Well as it implies, it's having to convert some data as it executes the query (as opposed to me having specified an explicit CAST() or CONVERT() function in my query).

The reason it needs to do this is because I defined my Price column as a VARCHAR(5):

Who put numeric data into a string datatype? Someone who hasn't had their coffee yet today.

In my query however, I'm doing a comparison against a number WHERE Price < 6.75. SQL Server is saying it doesn't know how to compare a string to a number, so it has to convert the VARCHAR string to a NUMERIC(3,2).

This is painful.

Why? Because SQL is performing that implicit conversion to the numeric datatype for every single row in my table. Hence, it can't seek using the index because it ends up having to scan the whole table to convert every record to a number first.

And this doesn't only happen with numbers and string conversion. Microsoft has posted an entire chart detailing what types of data type comparisons will force an implicit conversion:

<https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-type-conversion-database-engine>

That's a lot of orange circles/implicit conversions!

How can I query my coffee faster?

Well in this scenario, we have two options.

Fix the datatype of our table to align with the data actually being stored in this (data stewards love this).
Not cause SQL Server to convert every row in the column.

Number 1 above is self-explanatory, and the better option if you can do it. However, if you aren't able to modify the column type, you are better off writing your query like this:

SELECT Name, Price FROM dbo.CoffeeInventory WHERE Price < '6.75'

d03ed-1uzge0e3lizkhtodyonsfmg

Since we do a comparison of equivalent datatypes, SQL Server doesn't need to do any conversions and our index gets used. Woo-hoo!

What about the rest of my server?

Remember that chart above? There are a lot of different data comparisons that can force a painful column side implicit conversion by SQL Server.

Fortunately, Jonathan Kehayias has written a great query that helps you find column side implicit conversions by querying the plan cache. Running his query is a great way to identify most of the implicit conversions happening in your queries so you can go back and fix them — and then rejoice in your improved query performance!

One SSMS Trick That Will Make You a Faster Query Builder

Published Tue 25 July 2017 in SQL > SSMS

"17/365: i could be your magician" by Jin is licensed under CC BY 2.0

Watch this week's video on YouTube

Here's the scenario: you copy and paste some code into a query you are building. A few minutes later you need that same snippet again, but you've already copied and pasted something else onto the clipboard.

The next five minutes of your life are spent searching across the twenty query editor tabs you have open looking for that original piece of code.

Sound familiar?

THERE'S A BETTER WAY!

Copying and pasting is a feature that's available in nearly every text editor ("nearly" — anyone remember the days before iOS had a clipboard?).

However, SQL Server Management Studio goes above and beyond the regular copy and paste feature set — it has a clipboard ring.

What's a clipboard ring you ask?

ec575-1vgzb1j34ahgunbqofrgora

The clipboard ring let's you cycle through the last 20 things you copied onto your clipboard when you go to paste in SSMS. It can be accessed in the Edit menu (like in the screenshot above) or by using the keyboard shortcut CTRL + SHIFT + V.

Let's say you have the following queries:

----------------- Query 1 --------------------------
SELECT FruitId FROM dbo.Fruits WHERE Name = 'Apple'
----------------- Query 2 --------------------------
SELECT FruitId FROM dbo.Fruits WHERE Name = 'Banana'
----------------- Query 3 --------------------------
SELECT FruitId FROM dbo.Fruits WHERE Name = 'Orange'

And let's pretend you want to copy all of the fruit names into the IN statement of this query:

SELECT FruitId FROM dbo.Fruit WHERE Name IN ()

Instead of copying and pasting each fruit separately, you can batch your copies together and then paste them from the clipboard ring into your IN statement at the same time:

5c701-19e4bf0fjmtpji4ky8bz-mq

Use this trick the next time you need to find that snippet of code you used right before heading off to lunch and I guarantee you will be saving yourself tons of time.

How to Use SQL Temporal Tables For Easy Point-In-Time Analysis

Published Wed 19 July 2017 in SQL > Development > Temporal Tables

"Bordeaux, The Grand Theatre" by Stefano Montagner is licensed under CC BY-NC-ND 2.0

Watch this week's video on YouTube

Have you ever needed to look at what data in a table used to look like?

If you have, it probably took a knuckle-cracking filled session of writing group-by statements, nested sub-queries, and window functions to write your time-travelling query.

Sorry for your lost day of productivity — I've been there too.

Fortunately for us, SQL Server 2016 introduces a new feature to make our point-in-time analysis queries easy to write: temporal tables.

Temporal Tables? Are Those The Same As Temporary Tables?

Don't let the similar sounding name fool you: "temporal" <> "temporary".

Temporal tables consist of two parts:

The temporal table — this is the table that contains the current values of your data.
The historical table — this table holds all of the previous values that at some point existed in your temporal table.

You might have created a similar setup yourself in previous versions of SQL using triggers. However, using a temporal table is different from this because:

You don't need to write any triggers/stored procedures! All of the history tracking is done automatically by SQL Server.
Retrieving the data uses a simple WHERE clause — no complex querying required.

I want to make my life easier by using temporal tables! Take my money and show me how!

I'm flattered by your offer, but since we are good friends I'll let you in on these secrets for free.

First let's create a temporal table. I'm thinking about starting up a car rental business, so let's model it after that:

IF OBJECT_ID('dbo.CarInventory', 'U') IS NOT NULL 
BEGIN
    -- When deleting a temporal table, we need to first turn versioning off
    ALTER TABLE [dbo].[CarInventory] SET ( SYSTEM_VERSIONING = OFF  ) 
    DROP TABLE dbo.CarInventory
    DROP TABLE dbo.CarInventoryHistory
END
CREATE TABLE CarInventory   
(    
    CarId INT IDENTITY PRIMARY KEY,
    Year INT,
    Make VARCHAR(40),
    Model VARCHAR(40),
    Color varchar(10),
    Mileage INT,
    InLot BIT NOT NULL DEFAULT 1,
    SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL,
    SysEndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL,
    PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime)     
)   
WITH 
( 
    SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.CarInventoryHistory)   
)

The key things to note with our new table above are that

it contains a PRIMARY KEY.
it contains two datetime2 fields, marked with GENERATED ALWAYS AS ROW START/END.
It contains the PERIOD FOR SYSTEM_TIME statement.
It contains the SYSTEM_VERSIONING = ON property with the (optional) historical table name (dbo.CarInventoryHistory).

If we query our newly created tables, you'll notice our column layouts are identical:

SELECT * FROM dbo.CarInventory
SELECT * FROM dbo.CarInventoryHistory

Let's fill it with the choice car of car rental agencies all across the U.S. — the Chevy Malibu:

INSERT INTO dbo.CarInventory (Year,Make,Model,Color,Mileage) VALUES(2017,'Chevy','Malibu','Black',0)
INSERT INTO dbo.CarInventory (Year,Make,Model,Color,Mileage) VALUES(2017,'Chevy','Malibu','Silver',0)

Although we got some unassuming car models, at least we can express our individuality with two different paint colors!

In all of the remaining screen shots, the top result is our temporal table dbo.CarInventory and the bottom result is our historical table dbo.CarInventoryHistory.

You'll notice that since we've only inserted one row for each our cars, there's no row history yet and therefore our historical table is empty.

Let's change that by getting some customers and renting out our cars!

UPDATE dbo.CarInventory SET InLot = 0 WHERE CarId = 1
UPDATE dbo.CarInventory SET InLot = 0 WHERE CarId = 2

3c413-1bm7zz7udxcqupl7lu08mdg

Now we see our temporal table at work: we updated the rows in dbo.CarInventory and our historical table was automatically updated with our original values as well as timestamps for how long those rows existed in our table.

After a while, our customers return their rental cars:

UPDATE dbo.CarInventory SET InLot = 1, Mileage = 73  WHERE CarId = 1
UPDATE dbo.CarInventory SET InLot = 1, Mileage = 488 WHERE CarId = 2

It's totally possible for someone to have driven 73 or 488 miles in a Chevy Malibu in under 4 minutes…ever hear the phrase "drive it like a rental"?

Our temporal table show the current state of our rental cars: the customers have returned the cars back to our lot and each car has accumulated some mileage.

Our historical table meanwhile got a copy of the rows from our temporal table right before our last UPDATE statement. It's automatically keeping track of all of this history for us!

Continuing on, business is going well at the car rental agency. We get another customer to rent our silver Malibu:

UPDATE dbo.CarInventory SET InLot = 0 WHERE CarId = 2

0ac4b-1oci0xi8evaahk8fidhj0fg

Unfortunately, our second customer gets into a crash and destroys our car:

DELETE FROM dbo.CarInventory WHERE CarId = 2

The customer walked away from the crash unscathed; the same can not be said for our profits.

With the deletion of our silver Malibu, our test data is complete.

Now that we have all of this great historically tracked data, how can we query it?

If we want to reminisce about better times when both cars were damage free and we were making money, we can write a query using SYSTEM_TIME AS OFto show us what our table looked like at that point in the past:

SELECT
    *
FROM 
    dbo.CarInventory
FOR SYSTEM_TIME AS OF '2017-05-18 23:49:50'

The good old days.

And if we want to do some more detailed analysis, like what rows have been deleted, we can query both temporal and historical tables normally as well:

-- Find the CarIds of cars that have been wrecked and deleted
SELECT DISTINCT
    h.CarId AS DeletedCarId
FROM
    dbo.CarInventory t
    RIGHT JOIN dbo.CarInventoryHistory h
    ON t.CarId = h.CarId 
WHERE 
    t.CarId IS NULL

8c11d-1aaxrda5ljxkpenh9ei2t5a

C̶o̶l̶l̶i̶s̶i̶o̶n̶ Conclusion

Even with my car rental business not working out, at least we were able to see how SQL Server's temporal tables helped us keep track of our car inventory data.

I hope you got as excited as I did the first time I saw temporal tables in action, especially when it comes to querying with FOR SYSTEM_TIME AS OF. Long gone are the days of needing complicated queries to rebuild data for a certain point in time.

Here's a Quick Way To Generate a Running Total in SQL Server

Published Tue 18 July 2017 in SQL > Development > Window Functions

da24c-105adqqwalb9gszttpjw23q

Watch this week's video on YouTube

Historically it's been difficult to accomplish certain tasks in SQL Server.

Probably the most annoying problem I had to do regularly before SQL Server 2012 was to generate a running total. How can a running total be so easy to do in Excel, but difficult to do in SQL?

SUM(), click, drag, done. Excel, you will always have a place in my heart.

Before SQL Server 2012, the solution to generating a running total involved cursors, CTEs, nested subqueries, or cross applies. This StackOverflow thread has a variety of solutions if you need to solve this problem in an older version of SQL Server.

However, SQL Server 2012's introduction of window functions makes creating a running total incredibly easy.

First, some test data:

CREATE TABLE dbo.Purchases
(
  CustomerID int,
  TransactionDate date,
  Price int
)
INSERT INTO dbo.Purchases VALUES (1,'2017-06-01',5)
INSERT INTO dbo.Purchases VALUES (1,'2017-06-15',8)
INSERT INTO dbo.Purchases VALUES (1,'2017-06-18',3)
INSERT INTO dbo.Purchases VALUES (1,'2017-06-30',6)
INSERT INTO dbo.Purchases VALUES (2,'2017-05-04',5)
INSERT INTO dbo.Purchases VALUES (2,'2017-06-04',5)
INSERT INTO dbo.Purchases VALUES (2,'2017-07-04',1)
INSERT INTO dbo.Purchases VALUES (3,'2017-05-01',2)
INSERT INTO dbo.Purchases VALUES (3,'2017-05-02',8)
INSERT INTO dbo.Purchases VALUES (3,'2017-05-03',9)
INSERT INTO dbo.Purchases VALUES (3,'2017-05-04',5)
INSERT INTO dbo.Purchases VALUES (3,'2017-05-05',4)
INSERT INTO dbo.Purchases VALUES (3,'2017-05-06',2)

6a0d6-1wnrnamwua7g-rnvwtlgyzw

Next, we write our query using the following window function OVER() syntax:

SELECT
  CustomerID,
  TransactionDate,
  Price,
  SUM(Price) OVER (PARTITION BY CustomerID ORDER BY TransactionDate) AS RunningTotal
FROM
  dbo.Purchases

The syntax for our OVER() clause is simple:

SUM(Price) specifies which column we are creating the running total on
PARTITION BY specifies around what group of data we want to create our "window" — each new window will reset the running total
ORDER BY specifies in what order the rows should be sorted before being summed

The results? An easy to write running total:

52df1-1_ulawysmk6gmrjzvuaijgq

My Most Embarrassing SQL Moment

Published Tue 11 July 2017 in SQL > TSqlTuesday

T-SQL Tuesday #92: Lessons Learned the Hard Way

55ae8-1lh0mvkliatliiikt0vlyow

This post is a response to this month's T-SQL Tuesday prompt. T-SQL Tuesday was created by Adam Machanic and is a way for SQL users to share ideas about interesting topics. This month's topic is Lessons Learned the Hard Way.

Watch this week's video on YouTube

"Is this your query that's killing the server?"

It was my first week on the job and I was learning to query one of our major databases.

Up until that point, my SQL experience was limited to working on a *tiny* e-commerce database. Query performance was never something I had to deal with because any query I wrote, no matter how poorly written, would always execute quickly.

This new database I was working on though had tables with a billion+ rows. I should have been more conscious about how I was writing my joins and filtering my records, but I wasn't. I wrote my query and executed it in SQL Server Management Studio.

About 20 minutes into my query's execution, I received an email from my new DBA, and it looked something like this:

Uhh, there might be a problem here

"Is this your query that's killing the server?"

Oops.

I don't think my mouse ever moved to the stop execution button as quickly as it did that moment.

I was incredibly embarrassed to have brought our production server to a crawl. I was also incredibly embarrassed to have had my first interaction with my new DBA be about a query that created major problems for him.

Although there were no long-term damages from my server-crushing query, it was a scenario that I definitely didn't want to relive again in the future.

Next time: don't do that again

Obviously, this was an experience where I learned that maybe I shouldn't write queries against unfamiliar data in production.

I should have been practicing on a dev database.
I should have looked at table meta data and made sure I understood relationships between tables better.
I should have done some more preliminary querying with more restrictive filters to be able to catch performance problems earlier on with smaller result sets.
I should have examined what indexes were available and made sure I was attempting to use them.
I should have used a (NOLOCK) if I absolutely had to test on the production data so that at the very least I wouldn't prevent the high transaction ETLs from modifying data in that database.

All of those "should haves" quickly became my checklist for what to do before running any query in an unfamiliar environment.

I've still written plenty of ugly and inefficient queries since then, however none of them ever caused me to bring the SQL server to a halt like I did in my first week. That was one lesson that I learned the hard way.