Pinal Dave Helps Me Fix My Performance Tuning Problems

Published Tue 24 July 2018 in SQL > Performance Tuning

This week I was fortunate enough to film a video in collaboration with Pinal Dave, the SQL Authority himself. Pinal is creative, hilarious, and kind; making this video with him was A BLAST!

Although the video is a little tongue in cheek, Pinal's recommendations are very real: I've encountered plenty of scenarios where these solutions fixed slow queries. Will these recommendations fix the problem in every situation? Of course not, but they are a great place to start.

Instead of creating a text version of the concepts covered in the video (you should really watch it), I thought it would be fun to do a behind-the-scenes narrative of how the video came together because it is unlike any other project I've done before.

The Idea

After agreeing to make a video together, we tossed around a few ideas. Because we live in different time zones, we thought it would be a fun to do something where I kept waking Pinal up in the middle of the night.

We iterated over what SQL Server examples to use (originally the second example was going to show my queries running out of space because autogrowth being turned off). We also ended up adding another example after my wife suggested that having it build to three scenarios instead of two would be funnier - I agree!

Asynchronous Filming

You've probably already figured it out, but I didn't really wake Pinal up in the video (honestly, I think midnight would be too early to wake him up anyway; in our back and forth emails, I was seeing responses from him that were in the 1-2am range).

I filmed a preliminary version of my parts of the video, very roughly edited them together, and sent it over to Pinal.

He then filmed his segments, giving me lots of great footage (I'm not sure if it was ad-libbed or not, but I was dying of laughter when watching through his clips).

Then I re-filmed my parts to try to match his dialog as closely as possible. Re-filming my parts also allowed me to self-edit and not ramble as much.

Everything Else

After that, it was just the usual process of editing, color correction, audio processing, etc...

I'm happy with how it turned out, especially given all of the technical challenges we had with filming separately.

Major thanks again to Pinal for being supportive and willing to make a fun SQL Server video. Enjoy!

Join Elimination: When SQL Server Removes Unnecessary Tables

Published Tue 19 June 2018 in SQL > Development

In this post and video at SQLPerformance.com, I discuss what join elimination is and how it works in SQL Server. Join elimination is one of the many techniques that the SQL Server query optimizer uses to create efficient query plans. Specifically, join elimination occurs when SQL Server can establish equality by using query logic or trusted database constraints to eliminate unnecessary joins... Read more or watch the video

Is It Possible To Conditionally Index JSON Data?

Published Tue 01 May 2018 in SQL > Development > JSON

Watch this week's video on YouTube

Recently I received a great question from an attendee to one of my sessions on JSON (what's up Nam!):

2018-04-25_15-58-21

At first glance it sounds like a filtered index question, and ultimately it is, but because of some of the intricacies involved in the response I thought it would make for a good blog post.

The Problem: Schema On Read

Imagine I have a central table that keeps track of warnings and errors for my burrito ordering app:

DROP TABLE IF EXISTS dbo.BurritoAppLog;
GO

CREATE TABLE dbo.BurritoAppLog 
( 
    Id int IDENTITY PRIMARY KEY,
    ErrorDetails nvarchar(1000)
); 
GO 

INSERT INTO dbo.BurritoAppLog VALUES (N'{"Type":"Warning", "MessageId": 100, "Severity": "High", "Information":"Running low on steak." }'); 
INSERT INTO dbo.BurritoAppLog VALUES (N'{"Type":"Warning", "MessageId": 50, "Severity": "Low", "Information":"Running low on queso." }');
GO 4000
INSERT INTO dbo.BurritoAppLog VALUES (N'{"Type":"Error", "MessageId": 10, "User":"Bert", "ErrorMessage":"Lettuce not available." }'); 
INSERT INTO dbo.BurritoAppLog VALUES (N'{"Type":"Error", "MessageId": 20, "User":"Jim", "ErrorMessage":"Cannot wrap burrito with quadruple meat." }'); 
GO 100

2018-04-25_19-21-04

Now imagine wanting to generate a report of only the rows that are errors.

Obviously, you'd want to index this data for faster querying performance. Adding a non-clustered index on a non-persisted computed column of our JSON "Type" property will accomplish that:

ALTER TABLE dbo.BurritoAppLog 
ADD ErrorType AS JSON_VALUE(ErrorDetails, '$.Type');

ALTER TABLE dbo.BurritoAppLog 
ADD MessageId AS JSON_VALUE(ErrorDetails, '$.MessageId');

CREATE INDEX IX_ErrorType ON dbo.BurritoAppLog (ErrorType) INCLUDE (MessageId);

SELECT MessageId FROM dbo.BurritoAppLog WHERE ErrorType = 'Error'

And that works great. Except that error entries in our table make up only 2.5% of our total rows. Assuming we'll never need to query WHERE ErrorType = 'Warning' , this index is using a lot of unnecessary space.

So what if we create a filtered index instead?

Filtered JSON Indexes...

A filtered index should benefit us significantly here: it should save us space (since it won't include all of those warning rows) and it should make our INSERT queries into this table faster since the index won't need to be maintained for our non-"Error" rows.

So let's create a filtered index:

CREATE INDEX FX_ErrorType ON dbo.BurritoAppLog (ErrorType) INCLUDE (MessageId) WHERE ErrorType = 'Error'

Oh.

2018-04-25_19-47-03-1

So I guess we can't create a filtered index where the filter is on a computed column. Maybe SQL Server won't mind if we persist the computed column?

DROP INDEX IX_ErrorType ON dbo.BurritoAppLog

ALTER TABLE dbo.BurritoAppLog
DROP COLUMN ErrorType;

ALTER TABLE dbo.BurritoAppLog 
ADD ErrorType AS JSON_VALUE(ErrorDetails, '$.Type') PERSISTED;

CREATE INDEX FX_ErrorType ON dbo.BurritoAppLog (ErrorType) INCLUDE (MessageId) WHERE ErrorType = 'Error'

NOOOOOOPPPPEEEE. Same error message.

The issue is that SQL Server does not like computed columns, persisted or not, in a filtered index's WHERE clause. It's one of the many limitations of filtered indexse (Aaron Bertrand has a great post outlining many of the shortcomings).

Computed Column Filtered Index Workaround

What is a performance minded, space-cautious, JSON-loving developer supposed to do?

One workaround to get our filtered index would be to parse our ErrorType property into its own table column on insert:

ALTER TABLE dbo.BurritoAppLog 
ADD PermanentErrorType varchar(10);

UPDATE dbo.BurritoAppLog SET PermanentErrorType = JSON_VALUE(ErrorDetails, '$.Type');

2018-04-25_20-01-45

With our PermanentErrorType column in place, we have no problem generating our filtered index:

CREATE INDEX FX_PermanentErrorType ON dbo.BurritoAppLog (PermanentErrorType) INCLUDE (MessageId) WHERE PermanentErrorType = 'Error'

If we compare the sizes of our nonclustered index to our filtered index, you'll immediately that the filtered index is significantly smaller:

2018-04-25_20-12-31-1

However, our table size is now slightly larger because of the added table column.

Conclusion

So what do you do if you run into this situation? Well, if the ratio of undesired records to desired records is large like in the example above, you might want to make a permanent column to include in your filtered index - the size/performance benefit is certainly there. This does mean that your table size will be larger (additional column) but performance will be faster if your queries are able to use the smaller filtered index.

AT TIME ZONE: The easy way to deal with time zones and daylight savings time

Published Tue 27 March 2018 in SQL > Performance Tuning

Watch this week's video on YouTube

Many people think daylight savings time was created to help align the hours that the sun is up with our waking hours so farmers and school children didn't have to be outside in the dark. Or something like that.

Well let me fill you in on a little secret - daylight savings time was actually created by a government works project to ensure that programmers could forever write tedious conditional logic in their date-based queries to handle the date time conversions surrounding time zones and daylight savings time.

While these types of datetime conversions have historically been a pain, SQL Server 2016 introduced the AT TIME ZONE feature makes these types of conversions a breeze.

Defining Time Zone

Let's say you have a datetime value that you know is encoded in UTC (if you don't know what timezone your data was originally encoded in you're out of luck):

2018-03-26_11-54-21

Besides naming convention, there's nothing that tells us that our datetime is in UTC. But if we know that to be the case, we can use AT TIME ZONE to add an UTC offset to the timestamp to make it official:

2018-03-26_11-55-07

See that +00:00 at the end of our value? That's our time zone offset - it's basically telling us how many hours and minutes away from UTC our date is stored in. With this offset in place, our UTC datetime isn't encoded only by the variable name - it's actually encoded in the data itself.

Converting Time Zones

Now that we have a well-defined UTC datetime, we can use AT TIME ZONE to easily convert our data to different time zones.

For example, if we want to convert our UTC datetime to Eastern Standard Time, we can tack on AT TIME ZONE 'Eastern Standard Time' :

2018-03-26_12-03-37

The time portion of our datetime is now showing 7am with the offset indicating we are 4 hours behind UTC. Easy time zone conversions, yes!

But What About Our Farmer Friends?

So AT TIME ZONE makes it easy to convert between time zones - but how does it handle daylight savings time conversions?

Well, on March 11, 2018 I lost an hour of my life when daylight savings time kicked in at 2am. Does AT TIME ZONE recognize this theft?

2018-03-26_12-22-07

YES! Say so long to conditional conditions that subtract an amount of time based on the date; AT TIME ZONE handles the switch to DST with ease as noted by the offset change from -05:00 to -04:00 .

And just like that, programming for time zone and day light savings time logic just became a little easier.

But I Don't Live In The Eastern Standard Time Zone!

No problem, in addition to AT TIME ZONE, SQL Server 2016 added this nifty table that will give you the names of all of the time zones it supports:

select * from sys.time_zone_info

2018-03-26_12-28-54

OR vs UNION ALL - Is One Better For Performance?

Published Tue 20 February 2018 in SQL > Performance Tuning > Query Rewrites

Today I want to show you a trick that could make your queries run faster.

It won't always work, but when it does everyone will be impressed with your performance tuning prowess. Let's go!

Watch this week's video on YouTube

Our Skewed Data

Let's create a table and insert some data.

Notice the heavily skewed value distribution. Also notice how we have a clustered index and a very skimpy nonclustered index:

DROP DATABASE IF EXISTS ORUnionAll
CREATE DATABASE ORUnionAll
GO

CREATE TABLE ORUnionAll.dbo.TestData
(
    Col1 int,
    Col2 char(200),
    Col3 int 
)
GO

INSERT INTO ORUnionAll.dbo.TestData VALUES (1,'',1)
GO 10000
INSERT INTO ORUnionAll.dbo.TestData VALUES (2,'',2)
GO 50
INSERT INTO ORUnionAll.dbo.TestData VALUES (3,'',3)
GO 50

CREATE CLUSTERED INDEX CL_Col1 ON ORUnionAll.dbo.TestData ( Col1 )
GO

CREATE NONCLUSTERED INDEX IX_Col3 ON ORUnionAll.dbo.TestData (Col3)
GO

If we write a query that filters on one of the low-occurrence values in Col3, SQL Server will perform an index seek with a key lookup (since our skimpy nonclustered index doesn't cover all of the columns in our SELECT):

SELECT 
    Col2, Col3
FROM 
    ORUnionAll.dbo.TestData
WHERE
    Col3 = 2

index-seek-with-lookup

If we then add an OR to our WHERE clause and filter on another low-occurrence value in Col3, SQL Server changes how it wants to retrieve results:

SELECT 
    Col2, Col3
FROM 
    ORUnionAll.dbo.TestData
WHERE
    Col3 = 2 OR Col3 = 3

index-scan-with-ors

Suddenly those key-lookups become too expensive for SQL Server and the query optimizer thinks it'll be faster to just scan the entire clustered index.

In general this makes sense; SQL Server tries to pick plans that are good enough in most scenarios, and in general I think it chooses wisely.

However, sometimes SQL Server doesn't pick great plans. Sometimes the plans it picks are downright terrible.

If we encountered a similar scenario in the real-world where our tables had more columns, more rows, and larger datatypes, having SQL Server switch from a seek to a scan could kill performance.

So what can we do?

Solutions...maybe

The first thing that comes to mind is to modify or add some indexes.

But maybe our (real-world) table already has too many indexes. Or maybe we are working with a data source where we can't modify our indexes.

We could also use the FORCESEEK hint, but I don't like using hints as permanent solutions because they feel dirty (and are likely to do unexpected things as your data changes).

One solution to UNION ALL

One solution that a lot of people overlook is rewriting the query so that it uses UNION ALLs instead of ORs.

A lot of the time it's pretty easy to refactor the query to multiple SELECT statements with UNION ALLs while remaining logically the same and returning the same results:

SELECT 
    Col2, Col3
FROM 
    ORUnionAll.dbo.TestData
WHERE 
    Col3 = 2
UNION ALL
SELECT 
    Col2, Col3
FROM 
    ORUnionAll.dbo.TestData
WHERE
    Col3 = 3

Sure, the query is uglier and will be a bigger pain to maintain if you need to make changes in the future, but sometimes we have to suffer for ~~fashion~~ query performance.

But does our UNION ALL query perform better?

union-alls-with-seeks

Well the plan shows seeks, but as Erik Darling recently pointed out, seeks aren't always a good thing.

So let's compare the reads of the OR query versus the UNION ALL query using SET STATISTICS IO ON:

logical-reads

So in this case, tricking SQL Server to pick a a different plan by using UNION ALLs gave us a performance boost. The difference in reads isn't that large in the above scenario, but I've had this trick take my queries from minutes to seconds in the real world.

So the next time you are experiencing poor performance from a query with OR operators in it, try rewriting it using UNION ALLs.

It's not always going to fix your performance problem but you won't know until you give it a try.