Data with Bert logo

Are Stored Procedures Faster Than Stand-Alone Queries?

Watch this week's video on YouTube

A few months ago I was presenting for a user group when someone asked the following question:

Does a query embedded in a stored procedure execute faster than that same query submitted to SQL Server as a stand alone statement?

The room was pretty evenly split on the answer: some thought the stored procedures will always perform faster while others thought it wouldn't really matter.

In short, the answer is that the query optimizer will treat a query defined in a stored procedure exactly the same as a query submitted on its own.

Let's talk about why.

Start with a Plan

While submitting an "EXEC <stored procedure>" statement to SQL Server may require fewer packets of network traffic than submitting the several hundred (thousands?) lines that make up the query embedded in the procedure itself, that is where the efficiencies of a stored procedure end*.

*NOTE: There are certain SQL Server performance features, like temporary object cachingnatively compiled stored procedures for optimized tables, etc… that will improve the performance of a stored procedure over an ad hoc query. However in my experience, most people aren't utilizing these types of features so it's a moot point.

After receiving the query, SQL Server's query optimizer looks at these two submitted queries exactly the same. It will check to see if a cached plan already exists for either query (and if one does, it will use that), otherwise it will send both queries through the optimization process to find a suitable execution plan. If the standalone query and the query defined in the stored procedure are exactly the same, and all other conditions on the server are exactly the same at the time of execution, SQL Server will generate the same plans for both queries.

To prove this point, let's look at the following query's plan as well as the plan for a stored procedure containing that same query:

CREATE OR ALTER PROCEDURE dbo.USP_GetUpVotes
    @UserId int
AS
SELECT  
    COUNT(*) AS UpVotes 
FROM 
    dbo.Posts p
    INNER JOIN Votes v
        ON v.PostId = p.Id 
WHERE 
    p.OwnerUserId = @UserId
    and VoteTypeId = 2
ORDER BY UpVotes DESC


EXEC dbo.USP_GetUpVotes 23
DECLARE @UserId int = 23

SELECT 
    COUNT(*) AS UpVotes 
FROM 
    dbo.Posts p
    INNER JOIN Votes v
        ON v.PostId = p.Id 
WHERE 
    p.OwnerUserId = @UserId
    and VoteTypeId = 2
ORDER BY UpVotes DESC

.

As you can see, the optimizer generates identical plans for both the standalone query and the stored procedure. In the eyes of SQL Server, both of these queries will be executed in exactly the same way.

But I Swear My Stored Procedures Run Faster!

I think that a lot of the confusion for thinking that stored procedures execute faster comes from caching.

As I wrote about a little while back, SQL Server is very particular about needing every little detail about a query to be exactly the same in order for it to reuse its cached plan. This includes things like white space and case sensitivity.

It is much less likely that a query inside of a stored procedure will change compared to a query that is embedded in code. Because of this, it's probably more likely that your stored procedure plans are being ran from cached plans while your individually submitted query texts may not be utilizing the cache. Because of this, the stored procedure may in fact be executing faster because it was able to reuse a cached plan. But this is not a fair comparison - if both plans would pull from the cache, or if both plans had to generate new execution plans, they would both have the same execution performance.

So does it matter if I use stored procedures or not?

So while in the majority of cases a standalone query will perform just as quickly as that same query embedded in a store procedure I still think it's better to use stored procedures when possible.

First, embedding your query inside of a stored procedure increases the likelihood that SQL Server will reuse that query's cached execution plan as explained above.

Secondly, using stored procedures is cleaner for organization, storing all of your database logic in one location: the database itself.

Finally, and most importantly, using stored procedures gives your DBA better insight into your queries. Storing a query inside of a stored procedure means your DBA can easily access and analyze it, offering suggestions and advice on how to fix it in case it is performing poorly. If your queries are all embedded in your apps instead, it makes it harder for the DBA to see those queries, reducing the likelihood that they will be able to help you fix your performance issues in a timely manner.

Joining on NULLs

Watch this week's video on YouTube

It's important to be aware of columns that allow NULL values since SQL Server may handle NULLs differently than you might expect.

Today I want to look at what things to consider when joining on columns containing NULL values.

Natural, Composite, NULLable keys

Let's pretend we have an Account table containing the accounts of various users and an AccountType table describing the different types of accounts:

2019-03-22-12-33-46

These tables have the unfortunate design characteristics of:

  1. They use a natural, composite key of YearOpened and AccountType
  2. NULL is the valid default for AccountType

Not that either of the above attributes are outright bad, just that we need to handle them appropriately. For example, if we want to bring back a description of each user's account, we might write a query with an inner join like this:

SELECT
    a.UserId,
    at.YearOpened,
    at.AccountType,
    at.Description
FROM
    dbo.Account a
    INNER JOIN dbo.AccountType at
        ON a.YearOpened = at.YearOpened
        AND a.AccountType = at.AccountType

Only to discover the rows with NULLs are not present:

2019-03-22-12-31-39

Joining on NULLs

Since it's not possible to join on NULL values in SQL Server like you might expect, we need to be creative to achieve the results we want.

One option is to make our AccountType column NOT NULL and set some other default value. Another option is to create a new column that will act as a surrogate key to join on instead.

Both of the above options would fix the problem at the source, but what about if we can only make changes to our queries?

One common approach is to convert the NULLs to some other non-NULL value using a function like COALESCE or ISNULL:

SELECT
    a.UserId,
    at.YearOpened,
    at.AccountType,
    at.Description
FROM
    dbo.Account a
    INNER JOIN dbo.AccountType at
        ON a.YearOpened = at.YearOpened
        AND ISNULL(a.AccountType,'`') = ISNULL(at.AccountType,'`')

2019-03-22-12-49-47

While this returns the results we want, there are two major issues with this approach:

  1. In the above example we converted NULLs to the ` character. If we had a valid ` character in our data, we would get logically incorrect joins.
  2. Our query can no longer perform index seeks.

The first issue isn't a huge deal if you can guarantee the character you are replacing NULLs with will never appear in the column of data.

The second issue is more important since ISNULL prevents your query from being SARGable and will cause poor performance on large tables of data.

2019-03-22-12-51-04

Those Compute Scalar operators are forcing SQL Server to Scan the indexes and compute a value for every row.

A More Efficient Solution

If using a function like ISNULL hurts the performance of our queries, what can we do instead?

SELECT
    a.UserId,
    at.YearOpened,
    at.AccountType,
    at.Description
FROM
    dbo.Account a
    INNER JOIN dbo.AccountType at
        ON a.YearOpened = at.YearOpened
        AND (a.AccountType = at.AccountType OR (a.AccountType IS NULL AND at.AccountType IS NULL))

2019-03-22-12-48-59

This produces the same exact results while allowing SQL Server to Seek when possible and avoid costly row by row computations:

There are no seeks here since I don't have any additional filters, but the lack of Compute Scalar operators should be enough to prove the point.

While there are a few more variations that can achieve the same results using different execution plans (writing a query that joins non-nulls and unioning it with a query that selects only the nulls, using a computed column to convert the NULLs to non-null values, etc...) the key to good performance is to choose a solution that will not force SQL Server to compute values for every single row.

Optimizing for Ad Hoc Workloads

Watch this week's video on YouTube

The execution plan cache is a great feature: after SQL Server goes through the effort of generating a query plan, SQL Servers saves that plan in the plan cache to be reused again at a later date.

One downside to SQL Server caching almost all plans by default is that some of those plans won't ever get reused. Those single use plans will exist in the plan cache, inefficiently tying up a piece of the server's memory.

Today I want to look at a feature that will keep these one-time use plans out of the plan cache.

Plan Stubs

Instead of filling the execution plan cache with plans that will never get reused, the optimize for ad hoc workloads option will cache a plan stub instead of the full plan. The plan stub is significantly smaller in size and is only replaced with the full execution plan when SQL Server recognizes that the same query has executed multiple times.

This reduces the amount of size one-time queries take up in t he cache, allowing more reusable plans to remain in the cache for longer periods of time.

Enabling this server-level feature is as easy as (a database scoped versions :

sp_configure 'show advanced options',1
GO
reconfigure
GO  
sp_configure 'optimize for ad hoc workloads',1
GO
reconfigure
go

Once enabled you can watch the plan stub take up less space in the cache:

-- Run each of these queries once
DECLARE @Username varchar = 'A'
SELECT UserName 
FROM IndexDemos.dbo.[User] 
WHERE UserName like @Username+'%';
GO

DECLARE @Username varchar = 'B'
SELECT UserName 
FROM IndexDemos.dbo.[User] 
WHERE UserName like @Username+'%';
GO

SELECT 
    cp.cacheobjtype,
    cp.objtype,
    cp.plan_handle,
    cp.size_in_bytes,
    qp.query_plan,
    st.text
FROM
    sys.dm_exec_cached_plans cp
    CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
    INNER JOIN sys.dm_exec_query_stats qs
        ON cp.plan_handle = qs.plan_handle
    CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
WHERE 
    st.text like 'DECLARE @Username varchar =%';

424 bytes each, these plan stubs are tiny!

Now if we run our second query filtering on UserName LIKE 'B%' again and then check the plan cache, we'll notice the stub is replaced with an actual compiled plan:

image-2

The downside to plan stubs is that they add some cpu load  to our server: each query gets compiled twice before it gets reused from cache.  However, since plan stubs reduce the size of our plan cache, this allows more reusable queries to be cached for longer periods of time.

Great! All my cache problems will be solved

Not necessarily.

If your workload truly involves lots of ad hoc queries (like many analysts all working on different problems or dynamic SQL that's generating completely different statements on every execution), enabling Optimize for Ad hoc Workloads may be your best option (Kimberly Tripp also has a great alternative: clearing single use plans automatically on a schedule).

However, often times single-use query plans have a more nefarious origin: unparameterized queries. In this case, enabling Optimize for Ad hoc Workloads may not negatively impact your server, but it certainly won't help. Why? Because those original queries will still be getting generated.

Brent Ozar has a good overview of why this happens, but the short answer is to force parameterization on your queries. When you enable force parameterization, SQL Server will ~~not~~ automatically parameterize your queries if they aren't already, reducing the number of one off query plans in your cache.

Whether you are dealing with too many single use queries on your server or some other problem, just remember to find the root cause of the problem instead of just treating the symptoms.

Visualizing Nested Loops Joins And Understanding Their Implications

This post is the first in a series about physical join operators (be sure to check out part 2 - merge joins, and part 3 - hash match joins).

Watch this week's video on YouTube

What Physical Join Operators Tell Us

Everyone has their own method of reading an execution plan when performance tuning a slow SQL query.  One of the first things I like to look at are what kind of join operators are being used:

image-1

These three little icons may not seem like the most obvious place to begin troubleshooting a slow query, but with larger plans especially I like starting with a quick glance at the join operators because they allow you to infer a lot about what SQL Server thinks about your data.

This will be a three part series where we'll learn how each join algorithm works and what they can reveal about our upstream execution plan operators.

Nested Loops Join

Nested-Loop-Join-50fps-1

Nested loops joins work like this: SQL Server takes the first value from our first table (our "outer" table - by default SQL Server decides for us which table of the two this will be), and compares it to every value in our second "inner" table to see if they match. 

Once every inner value has been checked, SQL Server moves to the next value in the outer table and the process repeats until every value from our outer table has been compared to every value in our inner table.

This description is a worst case example of the performance of a nested loop join.  Several optimizations exist that can make the join more efficient.  For example, if the inner table join values are sorted (because of an index you created or a spool that SQL Server created), SQL Server can process the rows much faster:

nest-loops-sorted-50fps-2

In the above animation, the inner input is a index sorted on the join key, allowing SQL Server to seek directly to the rows it needs, reducing the total number of comparisons that need to be made

For more in-depth explanations of the internals and optimizations of nested loops joins, I recommend reading this post by Craig Freedman as well as Hugo Kornelis's reference on nested loops.

What Do Nested Loops Joins Reveal?

Knowing the internals of how a nested loops join works allows us to infer what the optimizer thinks about our data and the join's upstream operators, helping us focus our performance tuning efforts. 

Here are a few scenarios to consider the next time you see a nested loops join being used in your execution plan:

  • Nested loops joins are CPU intensive; at worst, every row needs to be compared to every other row and this can take some time.  This means when you see a nested loops join, SQL Server probably thinks that one of the two inputs is relatively small.
  • ... and if one of the inputs is relatively small, great!  If instead you see upstream operators that are moving large amounts of data, you may have a estimation problem going on in this area of the plan and may need to update stats/add indexes/refactor the query to have SQL Server provide better estimates (and maybe a more appropriate join).

  • Nested loops sometimes accompany RID or key lookups.  I always check for one of these because they often leave room for some performance improvements:

    • If a RID lookup exists, it's usually easy enough to add a clustered index to that underlying table to squeeze out some extra performance.
  • If either RID or key lookup exist, I always check what columns are being returned to see if a smaller index could be used instead (by including a column in a key/column of an existing index) or if the query can be refactored to not bring back those columns (eg. get rid of the SELECT *).

  • Nested loops joins do not require data to be sorted on input.  However, performance can improve with an indexed inner data source (see animation above), and SQL Server might choose a more efficient operator if the inputs are both sorted. 

    • At the very least, nested loops joins make me think to check whether the input data isn't sorted because of some upstream transformations, or because of missing indexes.

So while nested loops in your plans will always require more investigation, looking at them and the operators around them can provide some good insight into what SQL Server thinks about your data.

Bert & Pinal Troubleshoot a Slow Performing SQL Server

Watch this week's video on YouTube

What happens when you run into performance tuning expert Pinal Dave?  Talk about how to troubleshoot a slow performing SQL Server of course!

This week is all about the video, so if you can't view it above, head on over to my YouTube channel to watch it there.

After watching, be sure to check out Pinal's free performance diagnostic scripts over at SQLAuthority.com: