Data with Bert logo

Execution Plans: Statistics

Watch this week's video on YouTube

Last week we looked at what execution plans are and how you can view them.

This week I want to discuss what data the query optimizer uses to make cost-based calculations when generating execution plans. Understanding the meta-data SQL Server uses to generate query plans will later help us correct poorly performing queries.

Statistics

Statistics are the primary meta data used by the query optimizer to help estimate the costs of retrieving data for a specific query plan.

The reason SQL Server uses statistics is to avoid having to calculate information about the data during query plan generation. For example, you don't want to have the optimizer scan a billion row table to learn information about it, only to then scan it again when executing the actual query.

Instead, it's preferable to have those summary statistics pre-calculated ahead of time. This allows the query optimizer to quickly generate and compare multiple candidate plans before choosing one to actually execute.

Creating and Viewing Statistics

Statistics are automatically maintained when the Auto Create Statistics and Auto Update Statistics database properties are on (in general, you want to leave these on):

2019-07-29-20-31-36

Now, anytime an index is created or enough data in that index has changed (see thresholds for the amount of data in the SQL Server documentation), the statistics information will be updated and maintained as well.

To view the statistics for an index or column, you can run DBCC SHOW_STATISTICS:

DBCC SHOW_STATISTICS('dbo.Users','PK_Users_Id')

2019-07-29-20-32-58

Statistics Data

The data returned by DBCC SHOW_STATISTICS is what SQL Server uses to interpret what the data in your index looks like.

I don't to turn this into a deep dive on statistics , but here are a few key things you should be aware of in this statistics output: - Rows: the number of rows in the table. - Rows Sampled: how many rows were used to calculate these statistics. This can be a subset or all of the rows of your table. - All Density: a measure of how many distinct values are in your data. - Histogram: shows frequencies for up to 200 ranges of values in your index.

Altogether, this information helps paint a picture of your data for the SQL Server query optimizer to use for deciding how it should retrieve your data.

When this picture is accurate, the query optimizer makes decisions that allow your queries execute relatively efficiently. If it has correct estimates about the number of rows it needs to process and the uniqueness of values within those rows, then the query optimizer will most likely allocate appropriate amounts of memory, choose efficient join algorithms, and make other performance choices that will make your query run efficiently.

When the statistics don't accurately represent the data, the query optimizer still tries to do the best with the information it has available but often ends up making some less-than-optimal choices.

We'll save troubleshooting statistics information for another post, but for now be aware that if your statistics are inaccurate, there is a pretty good change SQL Server won't generate efficient execution plans.

Cardinality Estimators

Having high level statistics about your data is only the first part of the equation. SQL Server needs to use those statistics alongside some assumptions to estimate how expensive it will be to retrieve that data.

The assumptions SQL Server makes about data are contained within the Cardinality Estimator. The Cardinality Estimator is a model that makes assumptions about things like data correlation, distribution, and more.

There are two cardinality estimating models currently available in SQL Server: the Legacy Cardinality Estimator available from SQL Server 7.0 to SQL Server 2012, and the New Cardinality Estimator introduced in SQL Server 2014. Each of these Cardinality Estimators makes different assumptions about your data and can therefore produce different results. General guidance suggests to use the New Cardinality Estimator for new work, however sometimes it can be beneficial to use the Legacy Cardinality Estimator if its assumptions more closely align with your data.

Once again, I don't want to go too deep on the differences between cardinality estimators, but it's worth knowing that if you are getting bad estimates from the New Cardinality Estimator, you can easily revert to the Legacy Cardinality Estimator by appending the following hint to the end of your query:

OPTION (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION'));  

Conclusion

SQL Server pre-calculates statistics data on indexes and columns to provide the Query Optimizer information about the data it needs to retrieve. When these statistics accurately reflect what the data actually looks like, SQL Server will typically generate execution plans that perform well. When this data is outdated or missing, SQL Server makes an uninformed guess about how to retrieve the data, potentially causing serious performance issues.

Introduction to SQL Server Execution Plans

Watch this week's video on YouTube

I nearly always use execution plans as the starting point for SQL query performance troubleshooting. In this multi-part series, I plan to review the basics of execution plans and how you can use them to improve the performance of your own queries.

Execution Plans

SQL is a declarative language: instead of programming the details of how data should be retrieved, we describe what data we want and SQL Server figures out how it should return it to us.

For most queries that we want to performance tune, the SQL Server Query Optimizer considers multiple options for returning the data. It estimates the costs of these various approaches and lands on one that it thinks will return the data in an efficient manner. All of these options for retrieving the data are what are known as query execution plans.

Most of the time, this process works well and the Query Optimizer ends up choosing a query plan that returns the requested data quickly. This however does not mean that SQL Server chooses the "most efficient" query plan. Besides debating what "most efficient" means in different scenarios, SQL Server doesn't want to spend hours calculating every possible way to return your data when it only takes a fraction of that time to find a plan that is "good enough".

Problems arise when the Query Optimizer selects a query plan that isn't very good at all. This can happen due to a number of issues that we'll look at in future parts of this series.

Today we'll focus on learning how we can obtain execution plans for our queries.

Viewing Execution Plans

To see an execution plan for your query, you can run SET SHOWPLAN_ALL ON. This will provide a text-based tree representation of the plan:

SET SHOWPLAN_ALL ON
GO
--Query
GO
SET SHOWPLAN_ALL OFF
GO

2019-07-22-20-17-57

This may look familiar to you if you are used to looking at plans in other relational database environment that use similar tree-based plan explanations. In SSMS we have a a more visual option available to though. If we click the "Display Estimated Execution Plan" button or run SET SHOWPLAN_XML ON and execute our query, we will get our graphical execution plan:

2019-07-22-20-20-52

2019-07-22-20-21-45

This graphical representation is my personal preference (and what we will be mostly focusing on in this series) but it's important to know that as hinted by the last command, you can also right click on this graphical execution plan and choose "Show Execution Plan XML" to see the XML that is driving all of the visuals:

2019-07-22-20-22-31

2019-07-22-20-23-19

While most people don't find the XML as easy to digest as the graphical execution plan, it's worth knowing it is there: sometimes you will have to dive into the XML to find properties that don't get displayed in the graphical version.

Actual Plans

So far, every plan we've looked at is what's known as an "estimated" execution plan. The "estimated" name means it only contains "estimates" of how many rows will be processed based on internal meta data that SQL Server has available. You can view the "actual" execution plan by selecting the appropriate icon in SSMS:

2019-07-22-20-24-54

2019-07-22-20-25-19

There often confusion that occurs due to the naming of "estimated" vs "actual" execution plans. The difference is that the estimated plan is calculated before executing the SQL statement so it only has estimated meta data available for it to display, whereas the actual execution plan is that same estimated execution plan overlayed with runtime information like how many rows were processed, how much memory was used, etc…

Live Query Statistics

Live Query statistics give you the best of both estimated and actual execution plans. With Live Query Statistics enabled, SQL Server provides the estimated execution plan but overlays live runtime statistics on top of the plan as the query is executing in real-time.

2019-07-22-20-27-23

livequerystatistics

Live Query Statistics are nice because they allow you to often see "where" in the execution plan a query is experiencing a performance bottleneck. This is particularly helpful if you are new to analyzing execution plans and haven't yet learned all of the common signs and operators that might indicate poor performance. It is also helpful when your query is performing so poorly that you are unable to retrieve an actual plan for it (since it executes for what seems like forever).

Historical Plans

Calculating query plans isn't free, so SQL Server caches query plans for reuse. These cached plans can be viewed in the sys.dm_exec_query_plan DMV:

SELECT 
    *
FROM 
    sys.dm_exec_cached_plans      
    CROSS APPLY sys.dm_exec_query_plan(plan_handle)
    CROSS APPLY  sys.dm_exec_sql_text(plan_handle) 

2019-07-22-20-36-25

It's worth noting that this set of DMVs only show plans that are still in the cache and do not show actual plan statistics in them (plus some other limitations).

If you have Query Store enabled on your database, you can also access the query plans stored in the Query Store DMVs (or via the Query Store GUI):

SELECT 
    CAST(p.query_plan AS XML), 
    *
FROM 
    sys.query_store_query AS q
    INNER JOIN sys.query_store_plan AS p
        ON q.query_id = p.query_id

Conclusion

Regardless of how and where you decide to retrieve your execution plan from, all of the above techniques will help provide insight into how SQL Server is obtaining the data that you specified in your query.

How I Continuously Learn About SQL Server

Watch this week's video on YouTube

In order to stay current in a technology you have to immerse yourself in community content.

Documentation is good for knowing the specification of a language or tool and books or training courses are great for when you want to dive deep on a particular topic. However in many technology fields, including SQL Server, one of the best ways to stay up to date is by following blogs.

Not only are blogs posts typically showing newer features and techniques that haven't been able to make their way into books and training courses yet, they often will cover more specialized scenarios and edge cases than you'll encounter in the traditional learning sources. Following a broad user base of SQL Server professionals who blog solutions to problems they encounter and will often open your eyes to features of the tools you use everyday that you weren't even aware of.

This is especially nice for times when I'm not actively learning a specific topic. Reading a variety of blogs on various SQL Server subjects often reminds me how much I don't know and how many features I don't use during my day-to-day work. These blogs also serve as inspiration to investigate ideas further.

RSS Feeds

My favorite way to follow other SQL Server blogs is by subscribing to RSS feeds. I use Feedly as my RSS feed reader of choice (RIP Google Reader) and browse through all of my feeds whenever I have downtime. Inspired by Brent Ozar's post last year, I'm sharing all of the RSS feeds I follow in this OPML file.

You can download the OPML file and import it into your RSS reader as a starting point, or go through the list of websites and see if you find something new. I decided to also include my non-SQL feeds in the above link too (they are categorized in appropriate sections) in case you are curious about what sites I follow for other programming and career topics.

It's impossible to stay up to date with all the information that I want to, but staying up to date on RSS feeds gets me close.

I'm always looking for more sources - RSS is so easy to consume - so if you have any suggestions please let me know!

SELECT Expression Execution Order

Watch this week's video on YouTube

Today I want to share with you an interesting observation I made about SELECT expression execution order.

I was working on writing a dynamic SQL query that would transform the following piece of JSON:

{"data":[["a","b","c"],["d","e","f"]]}

Into a query that looked like this:

SELECT 'a' AS Row0Column0, 'b' AS Row0Column1, 'c' AS Row0Column2 
UNION ALL
SELECT 'd' AS Row1Column0, 'e' AS Row1Column1, 'f' AS Row1Column2 

Normally I would use something like OPENJSON and PIVOT to transform the original data into a table result set, but in this instance I my requirements dictated that I needed to build the code as a series of SELECT and UNION ALL statements.

The first step in building this query was using OPENJSON to parse the JSON data into rows and value arrays:

SELECT
    *
FROM
(
SELECT
    rows.[key] AS RowNumber,
    rows.[Value] AS RowArray
FROM 
    OPENJSON(N'{"data":[["a","b","c"],["d","e","f"]]}','$.data') rows
) r
CROSS APPLY OPENJSON(r.RowArray) v
ORDER BY
    r.RowNumber,
    v.[key]

image

This first query was a good start. I then added a variable @RowQuery and started building my dynamic SQL code to generate my SELECT and UNION ALL statements:

DECLARE 
    @RowQuery varchar(max)

/* TOP is here to get the ORDER BY to work as expected */
SELECT TOP 134960239460263
    @RowQuery =  COALESCE (@RowQuery + ' ','') + '''' + v.[value] + ''' as [Row'+r.RowNumber+'Column'+v.[key]+']'
FROM
(
SELECT
    rows.[key] AS RowNumber,
    rows.[Value] AS RowArray
FROM 
    OPENJSON(N'{"data":[["a","b","c"],["d","e","f"]]}','$.data') rows
) r
CROSS APPLY OPENJSON(r.RowArray) v
ORDER BY
    r.RowNumber,
    v.[key]

PRINT 'SELECT ' + @RowQuery;

image-1

At this point I had the row/column numbering correct, but I still needed to add a UNION ALL SELECT before the start of each row.

I thought, "Oh, this is easy. Since the dynamic SQL I'm building is basically a loop, I need to check for a change in the RowNumber column's value to identify I'm on a new row. If I am, I can insert the UNION ALL SELECT text and I'll be all set":

DECLARE 
    @RowQuery varchar(max),
    @CurrentRow int = 0;

/* TOP is here to get the ORDER BY to work as expected */
SELECT TOP 134960239460263
    @RowQuery =  COALESCE (@RowQuery + '','') + IIF(r.RowNumber > @CurrentRow, CHAR(10)+'UNION ALL'+CHAR(10)+'SELECT ', ', ')+'''' + v.[value] + ''' as [Row'+r.RowNumber+'Column'+v.[key]+']',
    @CurrentRow = IIF(r.RowNumber > @CurrentRow, r.RowNumber, @CurrentRow)
FROM
(
SELECT
    rows.[key] AS RowNumber,
    rows.[Value] AS RowArray
FROM 
    OPENJSON(N'{"data":[["a","b","c"],["d","e","f"]]}','$.data') rows
) r
CROSS APPLY OPENJSON(r.RowArray) v
ORDER BY
    r.RowNumber,
    v.[key]

/*remove the first comma and add an initial SELECT */
PRINT STUFF(@RowQuery,1,1,'SELECT'); 

image-2

Success! But as I was celebrating my dynamic SQL victory, I realized I was making an assumption about SQL Server that I had never thought about before:

The above query only works because SQL Server is executing the variables in the SELECT list sequentially. I'm incrementing @CurrentRow only after processing my @RowQuery variable, and this logic only works correctly if SQL Server executes the variable expressions in the order they appear in the SELECT list. If SQL Server was executing items in the SELECT list in reverse or random order, @CurrentRow could potentially get set BEFORE @RowQuery was evaluated, causing the logic of adding "UNION ALL SELECT" in the right location to fail.

This surprised me because I don't usually think about the column execution order of a query. Normally column expressions in the SELECT statement are independent of each other so the order that the columns are executed in doesn't really matter. But in this example, the column execution order does matter and it's reassuring to see SQL Server do what I assumed it was doing.

Now, I can't guarantee this always works. I tried but failed to think of a scenario where SQL Server wouldn't execute the columns in sequential order. While the query seemed to work as expected in all of the tests I ran, I'll leave this observation open ended in case anyone has ever encountered a scenario or has any ideas of when SQL Server doesn't process SELECT statement expressions in the order they are listed.

Behind the Scenes: How I Create Weekly Technical Videos

Watch this week's video on YouTube

Today I want to share my process for blogging and making weekly YouTube videos.

I've been hesitant to write this post because of how much I am still refining the process. Since I started making videos two years ago, not a few weeks go by without trying something new. I frequently try to improve my video and audio quality, learn new teaching strategies, and automate parts of the editing process.

So while I am excited to share what goes into making one of my videos, please be aware that it is still a work in progress.

My July 2019 Process for Creating Weekly Videos

Think of a topic - Monday/Tuesday

When I started blogging and making videos, thinking of topics to write about was difficult.

Then I started keeping a OneNote notebook to build up a list of ideas. Whenever I learned how to solve a problem or helped someone else solve a problem, I would write that idea down.

I also began outlining presentations I wanted to give and added a blog/video idea for each demo that I knew I wanted to include in my presentation. Not only did this help spread out the work required for creating a new presentation, but it gave me a lot of material to create for my blog and YouTube channel.

Today I have what feels like an endless list of topic ideas to write about. Every Monday and Tuesday I browse my list and choose what I think would be a good topic to use for the upcoming week's post/video. I take into consideration whether I discovered something particularly surprising recently, whether I need to create some demos for an upcoming presentation, and how busy my upcoming week is with other events.

The goal is that by Tuesday night I know what topic I'll want to work on during the upcoming week.

Create Demos - Wednesday/Thursday

On Wednesday and Thursday I write my demos.

Some weeks this is easy: often when I add a topic to my idea notebook, I include code that either partially or fully meets my demo needs.

Other weeks this step is more involved. This happens in particular when I need to pin down why a certain behavior is occurring. Maybe I discovered an interesting SQL Server behavior during my workday but didn't have time to dig into why it was happening. I might have found a quick fix to get me through the problem at work, but I want to do a little bit more research to get to the root cause for my post/video.

This is often my favorite part of the process, although sometimes it can also be the most frustrating. There have definitely been weeks where I start out with one idea in mind, but after struggling to build a solid demo, I choose a different topic for that week and stash away the original idea to try again some time in the future.

Film - Friday

I usually film my videos on Friday nights. Except when I film them on Saturday mornings.

Having a permanent studio setup in my basement has made this process much easier. Instead of needing to make sure my video background is clear of toys/general clutter, needing to set up lights and cameras, and asking for perfect silence from everyone in the house, I now have a space I can just sit down and record without any fuss.

I shoot my videos in one continuous take that later gets edited for content. I've listed the gear I use to shoot on at the bottom of this post, but to get a good feel for my setup check out this week's video starting at the 2:54 mark. I think it gives a cool example of how much lighting matters in these types of videos, as well as the benefits of post-production color grading.

Edit - Saturday/Sunday

Editing the raw video and audio recordings into a short and (hopefully) entertaining and educational video is where I have evolved my process the most.

Since I record everything in one long take, the first thing I do is sync up all of the various footage from cameras, screen captures, and audio recordings.

I then use Adobe Premiere's Multi-Camera editor to produce the rough cut of the video. This process is a lot like producing a live TV show - I use keyboard shortcuts to jump between video and audio tracks in real-time. By the time I finish watching the raw footage, I have a pretty good edit of the final video.

Once I have the initial edit done, I go back and start adding whatever additional overlays and screenshots I want to include. This part used to be a huge burden because Premiere requires 8-10 clicks just to get a screenshot added to the editing timeline.

Then I watched this 4-hour long video of a professional editor editing a YouTube video in real-time. The key takeaway was how much he is able to automate using AutoHotkey.

Inspired, I wrote my own AutoHotkey script to copy screen caps from SnagIt directly into Premiere with a single keyboard shortcut. This saves me a TON of time, not to mention makes the process of editing much more enjoyable.

Although all of this editing is time consuming, it's also fun. I create tightly edited videos like this because it's the type of video that I've always wished there was more of in the YouTube programming space. I figured if these are the kinds of videos that I want to watch, then hopefully other people will want to watch them too (and you have been, thank you!!!!)

Blog and Schedule - Monday

By Sunday, my video is usually done and uploaded (but not shared) on YouTube.

Sometimes I write my blog post on Monday nights, sometimes it gets written earlier in the process. Writing the post usually goes quickly since all of work of creating demos and thinking of a logical structure has already been prepared for the video.

Once the post is written, I schedule the post and video to automatically publish the next day.

Tuesday morning the post and video go live at 7am and the whole process repeats itself later that evening.

Tools of the Trade

Below is the list of gear I currently use. These are all tools I currently use to produce my online videos. Some of the links below are affiliate links to help fund my coffee consumption during those early morning editing sessions.

Software and Subscriptions

Epidemic Sound - Royalty free music and sound effects. Easy to search. I used to scour the internet for royalty free tracks, and while there are a lot out there, I got tired of searching through all of the junk to find the high quality recordings.

Adobe Creative Cloud - I've been using Photoshop and Premiere for decades. New versions of Premiere are generally buggy and terrible, so I usually wait a few months until a few updates come out after a release before installing a new version (and always install it side by side with the old version).

SnagIt - Lightweight screen capture software, both for stills and video. TechSmith also makes Camtasia which is a more full featured with its capturing and editing abilities.

Handbrake - I use this to convert my variable frame rate SnagIt screen recordings into constant frame rate videos that will sync nicely with my video footage in Premiere

Gear

Canon 80D DSLR - I bought this camera specifically for shooting video and it has proven to be a great value. Depending on where I am shooting, I will use the the 18-55mm kit lens, a Canon 10-18mm EFS lens (great for hand held selfie shots), or a Canon 50m f/1.4 EF lens (when set up in my studio).

Manfrotto Aluminum Tripod - Great solid tripod. I've owned several of these throughout the years and they perform great. I pair this with my favorite tripod head, the Manfrotto Joystick, which gives full 360-degree smooth adjustments.

JOBY GorillaPod - I love this little tripod when shooting handheld or when traveling because I don't have to bring full sized tripods with me (although, that means I'm usually moving tables and standing near objects that the GorillaPod can mount to).

RODE VideoMic - My go-to when I am recording within a few feet of the camera. Captures great audio, and the windscreen doubles as a dog-toy (not really...but dogs will try to eat it).

Tascam DR-05 - I use this when needing to capture 360-degree sound (typically when multiple people are talking on stage). In addition to using the built in microphones (which are decent), I also use it to record sound from an external microphone.

RODE Lavalier Microphone - A clip on mic. Essential when I'm standing far away from the camera or in noisy environments.

Studio backdrop and lights- This kit includes backdrops, stands, and lights. It's not high quality, but if you aren't afraid of modifying things to fit your needs this works really well as a starter pack. I replaced the included bulbs with 100W LED equivalents and rigged it up so that there were two lights in each softbox.

GVM LED Panel Light - My favorite light that's easy to take with me wherever I go. Lots of good light as long as you keep it within a few feet of you.

Mini Video Lights - Great for adding accent lights. I usually pair these with some colored gels to get nice background colors.