Data with Bert logo

How I Troubleshoot SQL Server Execution Plans

Watch this week's video on YouTube

Today I'm concluding my series on SQL Server execution plans by sharing the specific steps I take when troubleshooting a slow performing query.

Getting the Execution Plan

In part 1 of this series we discussed the different types of execution plans available and how to view them. My preference is to start with the poor performing query and request an execution plan for it.

2019-08-20-20-24-24

With the query pasted and formatted in my SSMS editor window, I like retrieving the estimated execution plan first, and then pasting the query into a second editor window and executing the query with the "Include Actual Execution Plan" option turned on. For bonus points, I'll split the SSMS window vertically so I can start looking at the estimated execution plan while the query runs and returns the actual execution plan: I like this combination because I (almost) immediately receive my estimated execution plan and can start looking for problems. Once the query on the right finishes executing and I get the actual plan with all of its lovely run-time stats, I usually switch to that looking at that one.

Some people take the opposite approach, looking at a cached query plan first before running the query to get a plan. This is fine too, but from my experience running the query first gives me some other data points to work with: is this query still running slowly, or was it a temporary issue? Did the person I'm reviewing the query for make some mistake and saw poor performance for some other unrelated reason? Getting the execution plan myself will help answer some of these questions.

livequerystatistics

But flaws exist with my process too. For example, running the query in an SSMS window may generate a new plan instead of reusing a cached plan (if something like white space is different between the two query texts). This is one of those things that is fine as long as you remember that's what could be happening. It even acts as a benefit because if you do get different query plans between your run and what exists in cache, then you know you might have a parameter sniffing problem. Finally, if the actual execution plan is impractical to retrieve (the query takes too long to run), then I will turn on Live Query Statistics: I don't start with this option because most queries I deal with will return by the time I finish looking over the estimated execution plan (and at that point the additional overhead of Live Query Statistics isn't worth it). But if I don't have time to wait for the actual plan to return, switching to Live Query Statistics and watching the long running query execute in real time will usually help diagnose the performance issue.

Unexpected Seeks vs Scans

Once I am viewing one of the execution plans above, one of the first things I check is where my data is being retrieved from.

If I see data coming primarily coming from nonclustered indexes, I feel pretty good that the data is being retrieved efficiently since only a limited number of columns are being returned and hopefully they are coming back in a beneficial sort order (this is going on the assumption that I keep my indexes pretty narrow to begin with).

If all I see are index scans, that may be fine, but I want to make sure:

  1. I am not seeing table scans - at the very least they should be clustered index scans,
  2. I am not using an unnecessary SELECT * in my query - why read in all of that extra data into memory or prevent a narrower index usage if I don't need it,
  3. SQL Server isn't scanning an entire index to return only a limit subset of rows

Finally, I like to double check any index seeks I see as part of key lookups. Once again, key lookups are not necessarily bad, but if I can include just one more column into a nonclustered index and get rid of the lookup, I may consider doing so if that part of the plan seems to be a bottleneck.

Inaccurate Row Estimates

Next I start looking at costly operators and checking their Actual Number of Rows vs Estimated Number of Rows values (for queries that don't return quickly, Live Query Statistics helps identify these easily). I may also look at the relative size of arrows to see if it looks like one operator is returning or reading significantly more rows than I would expect.

If Actual vs Estimated Number of Rows are vastly different (generally, if Actual is greater than Estimated by more than 100x), I start thinking about why SQL Server might be estimating the wrong number of rows by using common sense or looking at the statistics being used.

At this point I also consider whether the query is parameterized (either explicitly in the query or automatically by SQL Server). If it is, I usually start going down the path of checking for parameter sniffing.

Suspicious Operators

Next I look for any other commonly problematic operators in my plan: things like sorts, spools, hash joins, etc…

I've already covered these in a previous post of my Execution Plans series, but it's worth restating that I'm always keeping a look out for these operators.

Warnings

Finally, I quickly scan the plan for any yellow exclamation points present on any of the plan operators. These symbols indicate activities that SQL Server thinks it should warn us about. I also covered these in more detail in part 3, but it's worth mentioning again here because looking for these warnings can be a huge help in identifying the troublesome parts of your execution plan.

Conclusion

There is no one right way to troubleshoot or performance tune a query. I use the above method because it makes sense for my world where most of the queries I performance tune are my own and I have a pretty good knowledge of what other types of queries are running on my database. I hope sharing this process can help you develop your own preferred process for query tuning in your own environment.

Execution Plans: Statistics

Watch this week's video on YouTube

Last week we looked at what execution plans are and how you can view them.

This week I want to discuss what data the query optimizer uses to make cost-based calculations when generating execution plans. Understanding the meta-data SQL Server uses to generate query plans will later help us correct poorly performing queries.

Statistics

Statistics are the primary meta data used by the query optimizer to help estimate the costs of retrieving data for a specific query plan.

The reason SQL Server uses statistics is to avoid having to calculate information about the data during query plan generation. For example, you don't want to have the optimizer scan a billion row table to learn information about it, only to then scan it again when executing the actual query.

Instead, it's preferable to have those summary statistics pre-calculated ahead of time. This allows the query optimizer to quickly generate and compare multiple candidate plans before choosing one to actually execute.

Creating and Viewing Statistics

Statistics are automatically maintained when the Auto Create Statistics and Auto Update Statistics database properties are on (in general, you want to leave these on):

2019-07-29-20-31-36

Now, anytime an index is created or enough data in that index has changed (see thresholds for the amount of data in the SQL Server documentation), the statistics information will be updated and maintained as well.

To view the statistics for an index or column, you can run DBCC SHOW_STATISTICS:

DBCC SHOW_STATISTICS('dbo.Users','PK_Users_Id')

2019-07-29-20-32-58

Statistics Data

The data returned by DBCC SHOW_STATISTICS is what SQL Server uses to interpret what the data in your index looks like.

I don't to turn this into a deep dive on statistics , but here are a few key things you should be aware of in this statistics output: - Rows: the number of rows in the table. - Rows Sampled: how many rows were used to calculate these statistics. This can be a subset or all of the rows of your table. - All Density: a measure of how many distinct values are in your data. - Histogram: shows frequencies for up to 200 ranges of values in your index.

Altogether, this information helps paint a picture of your data for the SQL Server query optimizer to use for deciding how it should retrieve your data.

When this picture is accurate, the query optimizer makes decisions that allow your queries execute relatively efficiently. If it has correct estimates about the number of rows it needs to process and the uniqueness of values within those rows, then the query optimizer will most likely allocate appropriate amounts of memory, choose efficient join algorithms, and make other performance choices that will make your query run efficiently.

When the statistics don't accurately represent the data, the query optimizer still tries to do the best with the information it has available but often ends up making some less-than-optimal choices.

We'll save troubleshooting statistics information for another post, but for now be aware that if your statistics are inaccurate, there is a pretty good change SQL Server won't generate efficient execution plans.

Cardinality Estimators

Having high level statistics about your data is only the first part of the equation. SQL Server needs to use those statistics alongside some assumptions to estimate how expensive it will be to retrieve that data.

The assumptions SQL Server makes about data are contained within the Cardinality Estimator. The Cardinality Estimator is a model that makes assumptions about things like data correlation, distribution, and more.

There are two cardinality estimating models currently available in SQL Server: the Legacy Cardinality Estimator available from SQL Server 7.0 to SQL Server 2012, and the New Cardinality Estimator introduced in SQL Server 2014. Each of these Cardinality Estimators makes different assumptions about your data and can therefore produce different results. General guidance suggests to use the New Cardinality Estimator for new work, however sometimes it can be beneficial to use the Legacy Cardinality Estimator if its assumptions more closely align with your data.

Once again, I don't want to go too deep on the differences between cardinality estimators, but it's worth knowing that if you are getting bad estimates from the New Cardinality Estimator, you can easily revert to the Legacy Cardinality Estimator by appending the following hint to the end of your query:

OPTION (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION'));  

Conclusion

SQL Server pre-calculates statistics data on indexes and columns to provide the Query Optimizer information about the data it needs to retrieve. When these statistics accurately reflect what the data actually looks like, SQL Server will typically generate execution plans that perform well. When this data is outdated or missing, SQL Server makes an uninformed guess about how to retrieve the data, potentially causing serious performance issues.

Pinal Dave Helps Me Fix My Performance Tuning Problems

Watch this week's video on YouTube

This week I was fortunate enough to film a video in collaboration with Pinal Dave, the SQL Authority himself.  Pinal is creative, hilarious, and kind; making this video with him was A BLAST!

Although the video is a little tongue in cheek, Pinal's recommendations are very real: I've encountered plenty of scenarios where these solutions fixed slow queries.  Will these recommendations fix the problem in every situation?  Of course not, but they are a great place to start.

Instead of creating a text version of the concepts covered in the video (you should really watch it), I thought it would be fun to do a behind-the-scenes narrative of how the video came together because it is unlike any other project I've done before.

The Idea

After agreeing to make a video together, we tossed around a few ideas.  Because we live in different time zones, we thought it would be a fun to do something where I kept waking Pinal up in the middle of the night.

We iterated over what SQL Server examples to use (originally the second example was going to show my queries running out of space because autogrowth being turned off).  We also ended up adding another example after my wife suggested that having it build to three scenarios instead of two would be funnier - I agree!

Asynchronous Filming

You've probably already figured it out, but I didn't really wake Pinal up in the video (honestly, I think midnight would be too early to wake him up anyway; in our back and forth emails, I was seeing responses from him that were in the 1-2am range).

I filmed a preliminary version of my parts of the video, very roughly edited them together, and sent it over to Pinal.

He then filmed his segments, giving me lots of great footage (I'm not sure if it was ad-libbed or not, but I was dying of laughter when watching through his clips).

Then I re-filmed my parts to try to match his dialog as closely as possible.  Re-filming my parts also allowed me to self-edit and not ramble as much.

Everything Else

After that, it was just the usual process of editing, color correction, audio processing, etc...

I'm happy with how it turned out, especially given all of the technical challenges we had with filming separately.

Major thanks again to Pinal for being supportive and willing to make a fun SQL Server video.  Enjoy!