Are Stored Procedures Faster Than Stand-Alone Queries?

Published Tue 15 October 2019 in SQL > Development

A few months ago I was presenting for a user group when someone asked the following question:

Does a query embedded in a stored procedure execute faster than that same query submitted to SQL Server as a stand alone statement?

The room was pretty evenly split on the answer: some thought the stored procedures will always perform faster while others thought it wouldn't really matter.

In short, the answer is that the query optimizer will treat a query defined in a stored procedure exactly the same as a query submitted on its own.

Let's talk about why.

Start with a Plan

While submitting an "EXEC <stored procedure>" statement to SQL Server may require fewer packets of network traffic than submitting the several hundred (thousands?) lines that make up the query embedded in the procedure itself, that is where the efficiencies of a stored procedure end*.

*NOTE: There are certain SQL Server performance features, like temporary object caching, natively compiled stored procedures for optimized tables, etc… that will improve the performance of a stored procedure over an ad hoc query. However in my experience, most people aren't utilizing these types of features so it's a moot point.

After receiving the query, SQL Server's query optimizer looks at these two submitted queries exactly the same. It will check to see if a cached plan already exists for either query (and if one does, it will use that), otherwise it will send both queries through the optimization process to find a suitable execution plan. If the standalone query and the query defined in the stored procedure are exactly the same, and all other conditions on the server are exactly the same at the time of execution, SQL Server will generate the same plans for both queries.

To prove this point, let's look at the following query's plan as well as the plan for a stored procedure containing that same query:

CREATE OR ALTER PROCEDURE dbo.USP_GetUpVotes
    @UserId int
AS
SELECT  
    COUNT(*) AS UpVotes 
FROM 
    dbo.Posts p
    INNER JOIN Votes v
        ON v.PostId = p.Id 
WHERE 
    p.OwnerUserId = @UserId
    and VoteTypeId = 2
ORDER BY UpVotes DESC


EXEC dbo.USP_GetUpVotes 23

DECLARE @UserId int = 23

SELECT 
    COUNT(*) AS UpVotes 
FROM 
    dbo.Posts p
    INNER JOIN Votes v
        ON v.PostId = p.Id 
WHERE 
    p.OwnerUserId = @UserId
    and VoteTypeId = 2
ORDER BY UpVotes DESC

As you can see, the optimizer generates identical plans for both the standalone query and the stored procedure. In the eyes of SQL Server, both of these queries will be executed in exactly the same way.

But I Swear My Stored Procedures Run Faster!

I think that a lot of the confusion for thinking that stored procedures execute faster comes from caching.

As I wrote about a little while back, SQL Server is very particular about needing every little detail about a query to be exactly the same in order for it to reuse its cached plan. This includes things like white space and case sensitivity.

It is much less likely that a query inside of a stored procedure will change compared to a query that is embedded in code. Because of this, it's probably more likely that your stored procedure plans are being ran from cached plans while your individually submitted query texts may not be utilizing the cache. Because of this, the stored procedure may in fact be executing faster because it was able to reuse a cached plan. But this is not a fair comparison - if both plans would pull from the cache, or if both plans had to generate new execution plans, they would both have the same execution performance.

So does it matter if I use stored procedures or not?

So while in the majority of cases a standalone query will perform just as quickly as that same query embedded in a store procedure I still think it's better to use stored procedures when possible.

First, embedding your query inside of a stored procedure increases the likelihood that SQL Server will reuse that query's cached execution plan as explained above.

Secondly, using stored procedures is cleaner for organization, storing all of your database logic in one location: the database itself.

Finally, and most importantly, using stored procedures gives your DBA better insight into your queries. Storing a query inside of a stored procedure means your DBA can easily access and analyze it, offering suggestions and advice on how to fix it in case it is performing poorly. If your queries are all embedded in your apps instead, it makes it harder for the DBA to see those queries, reducing the likelihood that they will be able to help you fix your performance issues in a timely manner.

Communicating Effectively at the Workplace

Published Tue 08 October 2019 in Professional Development

MJ-t-sql-Tuesday

This post is a response to this month's T-SQL Tuesday #119 prompt by Alex Yates. T-SQL Tuesday is a way for the SQL Server community to share ideas about different database and professional topics every month.

This month's topic asks to write about what in your career that you have changed your mind about.

Watch this week's video on YouTube

In my first years as a developer, I used to think that being a great programmer meant you knew the latest technologies, followed the best design patterns, and used the trendiest tools. Because of this, I focused my time strictly on those topics.

On the flip side, what I didn't spend a lot of time on were communication skills. I figured as long as I could write an email without too many typos or if I could coherently answer a question someone asked of me, then communication was not something I had to focus on.

Over the years I've changed my mind about this however: communication skills are often just as important (if not sometimes more important) as any of the other technical skills a great programmer uses.

The Importance of Communication

In my experience, I've found three major improvements from spending some of my learning time on improving my communication skills:

1. You get ideas across clearly, the first time.

Nothing is worse than spending a significant amount of time talking about ideas, requirements, and next steps, only to have to go through all of it again because not everyone had the same understanding. Not only does this waste valuable time, it may result in development rework and cause frustration among team members.

2. You leave everyone feeling good

Unclear communication, or lack of communication altogether, leaves doubt in the minds of those you work with. Not communicating clearly about your own progress leaves teammates questioning if your dependency will allow them to finish their work on time, makes your manager question whether the project will be delayed, and makes your customers concerned about the stability of the product.

Taking a proactive approach to communicating clearly sets people's expectations up front, leaving little room for any doubt.

3. You get your needs met quicker

Once you learn to communicate more effectively, you start to notice when others aren't doing the same. This allows you to recognize potential problems or unclear information from the start. Once you start recognizing this, you'll be able to reframe a question or follow up with more specifics, resulting in your needs getting met quicker.

Recommendations

Below are a few techniques that have helped me become a better communicator. I'm by no means an expert, but I have found these few things to work really well for my situation.

1. Put yourself in their shoes

Before sending an email, reread it from the perspective of everyone you are sending it to. Even though everyone will be receiving the same exact text, their interpretation may vary based on their background or context. You will be doing yourself a huge favor if you can edit your message so that each imagined perspective interprets the message in the same way.

2. Keep communication concise

Be ruthless with self-editing before you speak and with cutting unnecessary thoughts in written form. When writing, put your most important ideas or requests first. If anything needs to be seen beyond the first line, call it out by bolding the text or highlighting it. Don't risk someone skipping over the important parts of your message.

3. Follow up

People are busy. You aren't always a priority. While you have to handle it politely, reminding someone that you need something from them is important. If you communicated clearly from the outset, this is easy to do. If you reread your initial request and realize it wasn't as clear as it could be, this is your opportunity to do a better job.

More resources

The best way to get better at communicating is through practice. Be intentional about it. Write only what matters and speak as succinctly when it comes to project work. I'm far from being where I want to be in these respects, but every time I write an email, write a video script, or speak up in a team meeting, I want to make sure what I'm saying is clear and as easy to understand as possible.

More "Wrong" SQL Server Math - Floating Point Errors

Published Tue 01 October 2019 in SQL > Development

Watch this week's video on YouTube

Last week we looked at how implicit conversions and datatype precedence can cause SQL Server to output unexpected results (if you aren't aware of how it handles these features).

This week I want to share another example of when SQL Server's output may surprise you: floating point errors.

Charts don't add up to 100%

barchart

Years ago I was writing a query for a stacked bar chart in SSRS. The chart intended to show the percentage breakdown of distinct values in a table. For example, the chart would show that value A made up 30% of the rows, B made up 3%, C made up 12% and so on. Since every row had a value, I was expecting the stacked bar chart percentages to add up to 100%

However, in many instances the charts would come up short; instead of a full 100%, the percentages would only add up to 98% or 99%. What was going on?

Floats

To see an example of how this happens, let's look at the following query:

SELECT CASE WHEN CAST(.1 AS FLOAT)+CAST(.2 AS FLOAT) = CAST(.3 AS FLOAT) THEN 1 ELSE 0 END

If you've never encountered this error before, you'd expect the query to return a result of 1. However, it doesn't:

0result

The reason this happens is that anytime you use the float datatype, SQL Server is trading off numeric precision for space savings - in actuality, that .1 in the query above is really stored as 0.10000000000000000555111512312578270212 and the .2 is stored as 0.20000000000000001110223024625156540424. Summed together, we get 0.30000000000000001665334536937734810636, not .3.

And to be clear, this isn't a problem with SQL Server - any language that implements the IEEE standard for float data types experiences these same issues.

Float Approximation

In SQL Server, the int datatype can store every whole number from -2,147,483,648 to 2,147,483,647 in only 4 bytes of space.

A single precision float, using the same 4 bytes of data, can store almost any value between between -340,000,000,000,000,000,000,000,000,000,000,000,000 and 340,000,000,000,000,000,000,000,000,000,000,000,000.

The reason floats can store such a large range is because they are only storing approximate values; some compression happens in those 4 bytes that allows SQL Server to store a wider range of data, but the increased range of values comes at the cost of losing some accuracy.

Close Enough For Horse Shoes, Hand Grenades, and Floats

In short, a float works by storing its value as a percentage within a range. As an oversimplified example, imagine representing the number 17 as 17% of the range from 1 to 100. All you need to store is the range 1,100 and the number 17%. This doesn't give us much efficiency for small numbers, but this allows us to store much larger numbers in exactly the same way. For example, to store the number 3 billion, you could make your range 1 billion (10\^9), and 10 billion (10\^10), and the percentage of 30%.

And while that oversimplified example uses base-10 to make it easy for my brain to think about, computers like doing calculations in base-2. And the math is a little bit more involved.

The Wikipedia page on floating point encoding is really good, but it uses a bunch of math notation that I haven't seen since high school. Let's reimagine that example with language we should be a little bit more familiar with: T-SQL.

A 4 byte number is made up of 32 bits. The floating point encoding breaks down these bits into 3 sections:

floating-point

The first bit in blue is for the sign. This just indicates whether we will be left or right of 0 on the number line.

The next 8 bits in green indicate our exponent. This tells us which range of numbers we are in. Since we are using binary, the range is stored as a power of 2. And we only need to store the start of the range, since the end of the range would be the next power of 2.

Finally the last 23 bits in red encode the fractional location of our value within the range. Calculating our actual value then is simple as:

equation

Oh yeah, I promised to do the math in T-SQL. Let's try that again.

First, we declare some variables to store the 3 encoded parts of our floating point number:

DECLARE 
    @sign int,
    @exponent int,
    @fraction decimal(38,38);

You'll notice I am storing @fraction as a decimal and not float. This is some foreshadowing about how decimal is a precise datatype that I'll come back to in a little bit.

Next we store the sign. Since we are encoding the value .15625, the sign is positive, so we set our @sign bit to 0:

SELECT @sign = 0;

Great. Now let's calculate the value of our exponent. If you've never converted binary to decimal before, you basically raise each 1 or 0 to the power of its position, so:

-- Returns 124 
SELECT @exponent =
    (0*POWER(2,7))
    +(1*POWER(2,6))
    +(1*POWER(2,5))
    +(1*POWER(2,4))
    +(1*POWER(2,3))
    +(1*POWER(2,2))
    +(0*POWER(2,1))
    +(0*POWER(2,0));

Next up is converting the last 23 bits to decimal. In this case, the encoding standard specifies these are to be calculated as (1/2\^n) instead of the regular 2\^n, because we want a fraction:

-- Returns .25
SELECT @fraction = 
    (0*(1.0/POWER(2,1)))
    +(1*(1.0/POWER(2,2)))
    +(0*(1.0/POWER(2,3)))
    +(0*(1.0/POWER(2,4)))
    +(0*(1.0/POWER(2,5)))
    +(0*(1.0/POWER(2,6)))
    +(0*(1.0/POWER(2,7)))
    +(0*(1.0/POWER(2,8)))
    +(0*(1.0/POWER(2,9)))
    +(0*(1.0/POWER(2,10)))
    +(0*(1.0/POWER(2,11)))
    +(0*(1.0/POWER(2,12)))
    +(0*(1.0/POWER(2,13)))
    +(0*(1.0/POWER(2,14)))
    +(0*(1.0/POWER(2,15)))
    +(0*(1.0/POWER(2,16)))
    +(0*(1.0/POWER(2,17)))
    +(0*(1.0/POWER(2,18)))
    +(0*(1.0/POWER(2,19)))
    +(0*(1.0/POWER(2,20)))
    +(0*(1.0/POWER(2,21)))
    +(0*(1.0/POWER(2,22)))
    +(0*(1.0/POWER(2,23)));

Finally, we put them altogether like so:

-- Result: .15625
SELECT 
    POWER(-1,@sign)
    * POWER(CAST(2 AS DECIMAL(38,37)),(@exponent-127)) 
    * (1+@fraction)

In this example, there is no floating point error - .15625 can be accurately stored as a float. However, if go through the same exercise for a number like .1 or .2, you'll notice your numbers are not so perfect.

Fixing Floating Point Errors

Floating point math errors can be fixed in a few ways.

One option is to stop caring about them. The error occurring on floats is very small (although when compounded through arithmetic, the error can grow large enough to be noticeable like in my reporting bar chart example). If you are writing queries or reports where such a small amount of error doesn't matter, then you can continue on your merry way without having to change anything.

A second option is to still store values as floats (for that sweet, sweet storage space savings), but ensure your application code has business logic to correctly round numbers that are in precise.

However, if your data needs to be perfectly accurate every single time with no errors, use a different datatype. The logical choice here would be to use decimal in SQL Server, which uses a different internal method for storing your numbers, resulting in perfect results every time. However, the range of possible values is not as large as float, and you will pay for that precision with additional bytes of storage space.

In the end, floating point is good enough for many applications. The important thing is that you are aware that these kind of errors can happen and that you handle them appropriately.

SQL Server's "Wrong" Math

Published Tue 24 September 2019 in SQL > Development

Watch this week's video on YouTube

A couple of weeks ago I decided to rebuild my recording studio by getting rid of my fabric backdrop and replacing it with a true wall instead. Doing this would allow me more flexibility when shooting, further improving my filming process efficiency.

To determine how much lumber I would need for building the new walls, I decided to write a SQL query to help with my framing calculations. I was building a 6 foot wall and wanted to put a stud every 16 inches. Easy enough to do the math on this:

SELECT (6*12)/16

2019-09-23-19-41-32

The output of the query above was 4, indicating the number of studs I would need for one wall section.

What's interesting is that if we do this same equation in a calculator, we get a slightly different answer: 4.5.

$sql-math$

And while I didn't end up framing my walls incorrectly, if I trusted the output of my query I would have had some incorrectly sized walls.

Is SQL Server Bad At Simple Math?

What happened? Well it all has to do with how SQL Server handles calculations.

While tweeting about my studio rebuild process, Kenneth Fisher from SQL Studies tweeted about what I would learn about SQL Server from rebuilding the studio.

https://twitter.com/sqlstudent144/status/1170376006370283520

Jokingly, I tweeted back the above SELECT (6*12)/16 example because it is funny (scary?) how SQL Server chops off the .5 if you don't understand what's going on.

When you perform calculations in SQL Server, it converts any expressions to the datatype that has the highest precedence. In the above example, since all of the numbers we are dealing with are integers, SQL Server keeps the final answer as an integer, apparently not caring what should have come after the decimal.

The quick and dirty way to solve this is to include a datatype in the equation that allows for decimals and has a higher precedence than integer. Basically, convert one of the integers to a numeric by adding .0 to any of the values:

SELECT (6*12)/16.0

2019-09-23-19-53-29

This will then return the expected result.

Following up on Twitter, Andy Mallon mentions that you don't even need the 0, simply adding . will suffice:

https://twitter.com/AMtwo/status/1170391334500388865

SELECT * (6*12)/16.

Pat Phelan then took it a step further, saying you can use the e syntax if you want to get the same successful result but confuse your users:

https://twitter.com/YetAnotherSQL/status/1170400514716164096

SELECT * (6*12)/16e0

Out of all of these methods, I prefer adding the .0 because it is the least ambiguous. For calculations that matter however, I also like to throw a CAST around individual values or the entire equation to be certain that I am getting a result with the precision and scale that I expect instead of letting SQL Server automatically guess for me:

SELECT CAST((6*12)/16.0 AS NUMERIC (2,1))

Yes, it's a few extra characters, but the intent is clear.

Why Does All of This Matter?

Implicit conversions and datatype precedence are something that most people starting with SQL Server are not aware of until they discover that their results are "wrong". If your queries require precise answers, then you have to be precise and explicit in how you handle the data (otherwise you might build a studio wall incorrectly!)

For more information about all these types of conversions, check out Andy's post on the subject which has even more fun examples.

SQL Server Quick Tips

Published Tue 17 September 2019 in SQL > Development

Earlier this year I made a series of videos that were played before the start of several DBA Fundamentals presentations. These videos gave viewers something SQL related to watch while the online rooms were filling up with attendees before the start of each day's live presentation.

Since these videos only ran once, and because my regular filming location is under construction this week, I decided to republish them here today.

Hope you enjoy these quick tips on several SQL Server concepts.

STATISTICS IO, TIME

Enabling STATISTICS IO and TIME can help you benchmark your queries to see if your tuning efforts are helping or hindering execution performance:

Watch this video on YouTube

Key Concepts

The terminology surrounding keys in relational databases can be confusing - in this overview we summarize the difference between primary keys, foreign keys, natural keys, and more:

Watch this video on YouTube

SARGability

When performance tuning queries, one of first things to check is whether your queries are SARGable and allow the optimizer to seek to the data it needs:

Watch this video on YouTube