Two Words For Faster Scalar Functions

Published on: 2018-12-04

Watch this week’s episode on YouTube.

WITH SCHEMABINDING optimizations were added all the way back in SQL Server 2005.  So why bother talking about them in 2018?

Because no one is taking advantage of them!

Ok, maybe that’s a little unfair.  I’m sure some people advantage of the optimizations, but most code I see posted online that could benefit doesn’t include it.  So let’s talk about an easy way for some of our non-data-accessing scalar functions to get a performance boost.

WITH SCHEMABINDING

When you create a function or view, you can add the WITH SCHEMABINDING option to prevent any database objects that the view/function uses from being modified.  This is a pretty cool feature which prevents you from making a table or column change that would cause a view/function to break.

And while that’s pretty cool functionality on its own, what’s even better is that the SQL Server optimizer can do some pretty cool things when it knows one of your non-data accessing scalar functions is schema bound.

For example, let’s say we have these two functions.  You’ll notice the second one includes the WITH SCHEMABINDING syntax:

When SQL Server executes a function, by default it checks whether the function has any database objects it needs to access.  For our example functions above, this is a waste of time since neither function accesses any data.

The WITH SCHEMABINDING option forces SQL Server to take note at function creation time whether any data access is occurring.  This allows SQL Server to skip that check at run time and allowing the function to execute significantly faster:

Take a look at those CPU times: 1594ms vs 62ms!  Since SQL Server saves a LOT of time by not having to verify the underlying data sources (for a more in-depth explanation of how SQL Server checks whether a function accesses data, I highly recommend reading this StackOverflow answer by Paul White).

WITH SCHEMABDINING also has performance optimization for queries that would normally need to implement Halloween Protection as well.  Halloween Protection essentially prevents SQL Server from modifying the same records more than once and usually implemented by the addition of an spool operator to the execution plan:

In this example, you’ll notice our non-schema-bound function introduces a Table Spool while our second schema-bound function forgoes this addition since SQL Server knows there won’t be any potential for conflict.

In conclusion, if you have a non-data-accessing scalar function you should always add WITH SCHEMABINDING to reap the benefits of SQL Server’s optimizations when available.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

19 thoughts on “Two Words For Faster Scalar Functions”

  1. While I understand your enthusiasm for using SCHEMABINDING for performance, as a DBA I find that people using that introduce far more problems than they solve, usually by breaking the application patch/release process. As you state, it locks the underlying objects, so you need to ensure that whoever is implementing SCHEMABINDING for their views/function/reports/etc is talking to the people who maintain those objects. Otherwise you’ll find fights breaking out between your BI and Dev teams 🙂

    1. Thanks for commenting Gavin. Is the release process breaking because the dev environment doesn’t match production, causing elevates to fail because they don’t include the logic to temporarily disable SCHEMABINDING? If so that’s more of a process failure then an issue with SCHEMABINDING (I am speaking for experience here…I have been the cause of that process failure several times haha 😂)

  2. you are right, but usually you should prevent scalar functions and replace them by inline table value functions.

    I made a test and created a
    CREATE OR ALTER FUNCTION dbo.udf_inline
    (@input VARCHAR(100))
    RETURNS TABLE AS RETURN
    SELECT REPLACE(@input, ‘ ‘, ”) CleanValue;
    GO
    with and without SCHEMABOUND.

    Things becomes now a little bit complicicated.

    When I test it with your script (inserting in #temp3 / #temp4) on SQL2017, it is slower than the schema bound scalar function (64 vs 32 ms CPU and 2000 ms vs 3 ms total run time). But this happens only, because the insert goes parallel and has to spool / wait / whatever for this.

    With OPTION (MAXDOP 1) or when I just select the data (and not insert it in a new temp) I’m back to the schemabound scalar values.

    -> as always it depends what is better in which situation and we need testing before deciding

    1. 100% get rid of the scalar functions when you can…and if you can’t, then use SCHEMABINDING :).

      And your last line sums up every belief I have about SQL Server.

  3. I’ve not tested the code from this particular article but will. While the article is well written, it has a major fault for testing… you MUST NOT use SET STATISTICS when testing Scalar or mTVF functions. It seriously skews the results. It’s ok to us against iTVFs but not Scalar or mTVF functions. Please see the following article for more information on that subject.

    http://www.sqlservercentral.com/articles/T-SQL/91724/

    1. Thanks for link Jeff, I never knew of SET STATISTICS TIME’s significant impact on CPU time when dealing with scalar functions – is this something you’ve seen documented by MSFT? I found some references to older versions of SQL Server, but couldn’t find anything current (https://support.microsoft.com/en-us/help/931279/sql-server-timing-values-may-be-incorrect-when-you-use-utilities-or-te).

      I reran my code using your testing framework several times and fortunately it still seems like WITH SCHEMABINDING provides about an order of magnitude better performance in these tests.

      1. Ah… to be sure, I wasn’t suggesting that WITH SCHEMABINDING wouldn’t work. It’s absolutely one of the best things to include in functions, Scalar or otherwise, for all of the reasons your good article mentions. I just didn’t want people to take the method of using SET STATISTICS to test Scalar functions away as method for correctly measuring performance when Scalar or mTVF functions are part of the test.

        And, no… to the best of my knowledge, MSFT has not documented the problem I speak of anywhere. Of course, they also haven’t mentioned things like the FORMAT function being 44 times slower than CONVERT, either. 😉

  4. Bert,

    Thanks for sharing this! Would this apply in any way to CLR scalar-valued functions? I can do some testing but I wasn’t sure if you knew off the top of your head. I don’t think I’ve ever even attempted to use WITH SCHEMABINDING on a function that calls an external resource…

  5. Ran the code and a funny thing has happened… although the Schema Binding version does run faster, the table spool does NOT show up in the non-Schema Binding version for me. I’m running SQL Server 2016 EE on SSDs.

    Also, if I use varying data instead of fixed data, the differences in duration are much less.

    1. That’s interesting Jeff. I ran the code on 2017 developer and 2016 EE and both show the spool. I wonder what the difference in our setups could be 🤔

      1. The only spool that I’m seeing is for the self join on the table that you’re using as a “Pseudo-Cursor” row-source and it appears in both execution plans. Same thing shows up in SQL Server 2008. I have no idea why there’s no spool associated with the non-Schema_Binding function because I’ve seen such a thing wrt other functions.

        I’m also finding that the differences are much less if the data that you’re passing to the functions is random (more like real data) than when passing a single value. If the data you used were in a table, it would have a cardinality of “1” and that can sometimes set a huge skew on the performance of some methods because SQL Server looks at it and says “Well, shoot… there’s only one real value in this table so I can take a shortcut”. I tried putting your and my data in tables and that doesn’t appear to affect such things here but it sure affects things in your original test.

        Again, I’m not even coming close to suggesting that people don’t use the SCHEMA_BINDING option. It would be nearly stupid to not include it especially in “calculation only” functions. I’m just interested in how the “Devils in the data” make things change for these types of things.

          1. Ah… crud. Figured it out. Getting hard for me to see in my old age, I guess. The execution plans you posted weren’t for the original INSERTs into #Temp1 and #Temp2. They were for the UPDATEs that I apparently skipped right over (lordy… how did I miss those two lines of obvious code?).

            I’m going to play some more and try to get back to this with some code that might explain the huge difference in performance on the initial INSERTs.

            Again, using SCHEMA_BINDING is still the right thing to do. I’m just working on the “Devils in the data”.

  6. @Gavin Harris : Before we started using SCHEMABINDING we had breakdowns post-release. Someone would change a Table (say) and some little used function would be impacted, not found during normal QA and a long time later that would break in the field. Yeah, I question why QA didn’t find it, but such things happen.

    Since putting SCHEMABINDING in Functions that means that each time we change a table all the impacted functions have to be dropped first, and recreated. And that brings to light that the re-CREATE fails … and so we fix the Function … and that is LONG before it gets anywhere near QA, so avoids a QA-DEV-fix-QA cycle. But, yeah, we had to adapt our Release Process so that any such DROP did actually get to be recreated and got into the scripting process, particularly where there was no actual change to the code for the Function (i.e. it does not have a recent Checkin / Modify-date etc.)

    We have a house rule that SCHEMABINDING must be included in every Function, even if it is only as a comment to say “SCHEMABINDING not appropriate here” … but thanks to this article I ran a GREP and found half a dozen new ones that don’t – and, indeed, have no dependency at all so will get a performance boost.

    Amazingly our SOP doesn’t actually detail this coding requirement – how did that happen? 🙂 So something else I can tidy up today

    1. Glad to hear you are adding it as a best practice Kristin. I agree that once you nail down the release process, it should be painless (and beneficial!)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.