How to Eliminate Ugly Nested REPLACE() Functions

Published Tue 15 August 2017 in SQL > Development

"On white: Who you really are" by James Jordan is licensed under CC BY-ND 2.0

How many times have you had to transform some column value and ended up stacking several nested SQL REPLACE() functions like this?

-- Input: Red, Blue, Green
-- Output: RGB
SELECT 
    REPLACE(
    REPLACE(
    REPLACE(
    REPLACE(c.Colors,'Red','R')
    ,'Green','G')
    ,'Blue','B')
    ,', ','') AS Colors
FROM
    (SELECT 'Red, Green, Blue' AS Colors) c

Ugly right? And that's after careful formatting to try and make it look readable. I could have just left it as:

-- Input: Red, Blue, Green
-- Output: RGB
SELECT REPLACE(REPLACE(REPLACE(REPLACE(c.Colors,'Red','R'),'Green','G'),'Blue','B'),', ','') AS Colors
FROM
    (SELECT 'Red, Green, Blue' AS Colors) c

Here we only have 4 nested REPLACE functions. My shameful record is 29. I'm not proud of it, but sometimes it's the only way to get things done.

Not only are these nested REPLACE() functions difficult to write, but they are difficult to read too.

Instead of suffering through all of that ugly nesting, what you can do instead is use CROSS APPLY:

-- Input: Red, Blue, Green
-- Output: RGB
SELECT 
    s.Colors
FROM
    (SELECT 'Red, Green, Blue' AS Colors) c
    CROSS APPLY (SELECT REPLACE(c.Colors,'Red','R') AS Colors) r
    CROSS APPLY (SELECT REPLACE(r.Colors,'Green','G') AS Colors) g
    CROSS APPLY (SELECT REPLACE(g.Colors,'Blue','B') AS Colors) b
    CROSS APPLY (SELECT REPLACE(b.Colors,', ','') AS Colors) s

Technically the CROSS APPLY solution uses more characters. But it is infinitely more readable.

And the server? The server doesn't care about the additional characters —it still compiles it down to the same 1s and 0s:

a78b7-1grof8vly9hnzts5u2jh4gq

So next time you have to nest several REPLACE() functions (or any other string functions), do yourself a favor and make it more readable by using CROSS APPLY instead. Your future self will thank you.

SQLskills is Giving Away Free Training!

Published Fri 11 August 2017 in Meta

1d376-11cd50lita4ojbviwjefhww

SQLskills is giving away free training for their performance tuning and optimization classes. My entry for the competition is below. If you decide to enter for yourself, entries are due by 11:59 PM Pacific Time on Sunday, August 13th, 2017.

Watch this week's video on YouTube

I would love to attend the Immersion Event on Performance Tuning and Optimization — Part 1 training with Paul and Kimberly of SQLskills.

Why do I want to attend?

I want my brain to be filled past the brim with SQL Server internals and performance tuning knowledge. And I know this class will provide that opportunity.

I saw Kimberly present at PASS Summit in 2013. In 75 minutes, I had filled 4 pages of notes about skewed table data and how it affects statistics. I received great information that was immediately applicable to the queries I was working on back at the office.

I've never seen Paul present live, but I've been responsible for more than a few dozen hits to his blog posts on DBCC IND and DBCC PAGE. Whenever I have a question about SQL Server internals, I inevitably think "does Paul have a blog post on this topic?"

I've heard from others that IEPTO1 is amazing (and exhausting…in a good way!). I've learned so much from reading SQLskills team's blog posts, watching Pluralsight, and sitting in at SQL Saturday sessions that I am certain that I would thoroughly enjoy a week of intense training with Paul and Kim.

How would I use the knowledge?

By giving back to the community.

Every week I write blog posts and create videos teaching analysts, developers, and DBAs how to improve their SQL querying skills. I speak at local user groups, SQL Saturdays, and at conferences. I help coworkers and those on Twitter with solving their SQL problems.

By taking this training, I will advance my own technical understanding, which in turn helps me be a better SQL mentor.

My favorite type of performance tuning challenge

I love it when I can decode some of the "magic" that SQL Server is doing behind the scenes.

For example, recently when learning to work with JSON in SQL Server 2016, I was mystified by how SQL Server could quickly filter JSON data using a non-persisted computed column index. Was it truly parsing JSON on the fly or was it doing something else?

That curiosity led me to investigate further with DBCC PAGE. To my surprise, SQL Server really wasn't persisting my parsed JSON values on the data pages; it was however persisting the parsed JSON property on the index pages.

One more SQL Server mystery revealed.

Thank you.

Thank you for running this competition and giving people the opportunity to receive world-class training.

Additionally, thank you for all of the blog posts, newsletters (book reviews!), Pluralsight courses, and everything else you do to help the SQL community; I have benefited tremendously from all of these resources over the years.

Why Parameter Sniffing Isn't Always A Bad Thing (But Usually Is)

Published Tue 08 August 2017 in SQL > Development

In this series I explore scenarios that hurt SQL Server performance and show you how to avoid them. Pulled from my collection of "things I didn't know I was doing wrong for years."

Watch this week's video on YouTube

Last week we discussed how implicit conversions could be one reason why your meticulously designed indexes aren't getting used.

Today let's look at another reason: parameter sniffing.

Here's the key: Parameter sniffing isn't always a bad thing.

Most of the time it's good: it means SQL Server is caching and reusing query plans to make your queries run faster.

Parameter sniffing only becomes a problem when the cached plan isn't anywhere close to being the optimal plan for given input parameters.

So what's parameter sniffing?

Let's start with our table dbo.CoffeeInventory which you can grab from Github.

4dedb-17uyfuaa5zzf_h5civ9ww4w

The key things to know about this table are that:

We have a nonclustered index on our Name column.
The data is not distributed evenly (we'll see this in a minute)

Now, let's write a stored procedure that will return a filtered list of coffees in our table, based on the country. Since there is no specific Country column, we'll write it so it filters on the Name column:

DROP PROCEDURE IF EXISTS dbo.FilterCoffee
GO
CREATE PROCEDURE dbo.FilterCoffee
@ParmCountry varchar(30)
AS
BEGIN
    SELECT Name, Price, Description 
    FROM Sandbox.dbo.CoffeeInventory
    WHERE Name LIKE @ParmCountry + '%'
END
GO

Let's take a look at parameter sniffing in action, then we'll take a look at why it happens and how to solve it.

EXEC dbo.FilterCoffee @ParmCountry = 'Costa Rica'
EXEC dbo.FilterCoffee @ParmCountry = 'Ethiopia'

Running the above statement gives us identical execution plans using table scans:

In this case we explicitly specified the parameter @ParmCountry. Sometimes SQL will parameterize simple queries on its own.

That's weird. We have two query executions, they are using the same plan, and neither plan is using our nonclustered index on Name!

Let's step back and try again. First, clear the query plan cache for this stored procedure:

DECLARE @cache_plan_handle varbinary(44)
SELECT @cache_plan_handle = c.plan_handle
FROM 
    sys.dm_exec_cached_plans c
    CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) t
WHERE 
    text like 'CREATE%CoffeeInventory%' 
-- Never run DBCC FREEPROCCACHE without a parameter in production unless you want to lose all of your cached plans...
DBCC FREEPROCCACHE(@cache_plan_handle)

Next, execute the same stored procedure with the same parameter values, but this time with the 'Ethiopia' parameter value first. Look at the execution plan:

EXEC dbo.FilterCoffee @ParmCountry = 'Ethiopia'
EXEC dbo.FilterCoffee @ParmCountry = 'Costa Rica'

9a86e-1en_loooe8aojxr0jonohzg

Now our nonclustered index on Name is being utilized. Both queries are still receiving the same (albeit different) plan.

We didn't change anything with our stored procedure code, only the order that we executed the query with different parameters.

What the heck is going on here!?

This is an example of parameter sniffing. The first time a stored procedure (or query) is ran on SQL server, SQL will generate an execution plan for it and store that plan in the query plan cache:

SELECT
    c.usecounts,
    c.cacheobjtype,
    c.objtype,
    c.plan_handle,
    c.size_in_bytes,
    d.name,
    t.text,
    p.query_plan
FROM 
    sys.dm_exec_cached_plans c
    CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) t
    CROSS APPLY sys.dm_exec_query_plan(c.plan_handle) p
    INNER JOIN sys.databases d
    ON t.dbid = d.database_id
WHERE 
    text like 'CREATE%CoffeeInventory%'

2c94d-1w8c6qvsh_ca3jumixdbnjw

All subsequent executions of that same query will go to the query cache to reuse that same initial query plan — this saves SQL Server time from having to regenerate a new query plan.

Note: A query with different values passed as parameters still counts as the "same query" in the eyes of SQL Server.

In the case of the examples above, the first time the query was executed was with the parameter for "Costa Rica". Remember when I said this dataset was heavily skewed? Let's look at some counts:

SELECT 
    LEFT(Name,CHARINDEX(' ',Name)) AS Country, 
    COUNT(*) AS CountryCount 
FROM dbo.CoffeeInventory 
GROUP BY 
    LEFT(Name,CHARINDEX(' ',Name))

109ae-1rdzlvzj99ccxpni3mwjnaw

"Costa Rica" has more than 10,000 rows in this table, while all other country names are in the single digits.

This means that when we executed our stored procedure for the first time, SQL Server generated an execution plan that used a table scan because it thought this would be the most efficient way to retrieve 10,003 of the 10,052 rows.

This table scan query plan is only optimal for Costa Rica . Passing in any other country name into the stored procedure would return only a handful of records, making it more efficient for SQL Server to use our nonclustered index.

However, since the Costa Rica plan was the first one to run, and therefore is the one that got added to the query plan cache, all other executions ended up using the same table scan execution plan.

After clearing our cached execution plan using DBCC FREEPROCCACHE, we executed our stored procedure again but with 'Ethiopia' as our parameter. SQL Server determined that a plan with an index seek is optimal to retrieve only 6 of the 10,052 rows in the table. It then cached that Index Seek plan, which is why the second time around the 'Costa Rica' parameter received the execution plan with Index Seek.

Ok, so how do I prevent parameter sniffing?

This question should really be rephrased as "how do I prevent SQL Server from using a sub-optimal plan from the query plan cache?"

Let's take a look at some of the techniques.

1. Use WITH RECOMPILE or OPTION (RECOMPILE)

We can simply add these query hints to either our EXEC statement:

EXEC dbo.FilterCoffee @ParmCountry = 'Ethiopia' WITH RECOMPILE
EXEC dbo.FilterCoffee @ParmCountry = 'Costa Rica' WITH RECOMPILE

or to our stored procedure itself:

DROP PROCEDURE IF EXISTS dbo.FilterCoffee
GO
CREATE PROCEDURE dbo.FilterCoffee
@ParmCountry varchar(30)
AS
BEGIN
    SELECT Name, Price, Description 
    FROM Sandbox.dbo.CoffeeInventory 
    WHERE Name LIKE @ParmCountry + '%'

    OPTION (RECOMPILE)

END
GO

What the RECOMPILE hint does is force SQL Server to generate a new execution plan every time these queries run.

Using RECOMPILE eliminates our parameter sniffing problem because SQL Server will regenerate the query plan every single time we execute the query.

The disadvantage here is that we lose all benefit from having SQL Server save CPU cycles by caching execution plans.

If your parameter sniffed query is getting ran frequently, RECOMPILE is probably a bad idea because you will encounter a lot of overheard to generate the query plan regularly.

If your parameter sniffed query doesn't get ran often, or if the query doesn't run often enough to stay in the query plan cache anyway, then RECOMPILE is a good solution.

2. Use the OPTIMIZE FOR query hint

Another option we have is to add either one of the following hints to our query. One of these would get added to the same location as OPTION (RECOMPILE) did in the above stored procedure:

OPTION (OPTIMIZE FOR (@ParmCountry UNKNOWN))

OPTION (OPTIMIZE FOR (@ParmCountry = 'Ethiopia'))

OPTIMIZE FOR UNKNOWN will use a query plan that's generated from the average distribution stats for that column/index. Often times it results in an average or bad execution plan so I don't like using it.

OPTIMIZE FOR VALUE creates a plan using whatever parameter value specified. This is great if you know your queries will be retrieving data that's optimized for the value you specified most of the time.

In our examples above, if we know the value 'Costa Rica' is rarely queried, we might optimize for index seeks. Most queries will then run the optimal cached query plan and we'll only take a hit when 'Costa Rica' is queried.

3. IF/ELSE

This solution allows for ultimate flexibility. Basically, you create different stored procedures that are optimized for different values. Those stored procedures have their plans cached, and then an IF/ELSE statement determines which procedure to run for a passed in parameter:

DROP PROCEDURE IF EXISTS dbo.FilterCoffee
GO
CREATE PROCEDURE dbo.FilterCoffee
@ParmCountry varchar(30)
AS
BEGIN
    IF @ParmCountry = 'Costa Rica'
    BEGIN
    EXEC dbo.ScanningStoredProcedure @ParmCountry
    END
    ELSE
    BEGIN
    EXEC dbo.SeekingStoredProcedure @ParmCountry
    END
END
GO

This option is more work (How do you determine what the IF condition should be? What happens more data is added to the table over time and the distribution of data changes?) but will give you the best performance if you want your plans to be cached and be optimal for the data getting passed in.

Conclusion

Parameter sniffing is only bad when your data values are unevenly distributed and cached query plans are not optimal for all values.
SQL Server caches the query plan that is generated from the first run of a query/stored procedure with whatever parameter values were used during that first run.
Using the RECOMPILE hint is a good solution when your queries aren't getting ran often or aren't staying in the the query cache most of the time anyway.
The OPTIMIZE FOR hint is good to use when you can specify a value that will generate a query plan that is efficient for most parameter values and are OK with taking a hit for a sub-optimal plan on infrequently queried values.
Using complex logic (like IF/ELSE) will give you ultimate flexibility and performance, but will also be the worst for long term maintenance.

Are your indexes being thwarted by mismatched datatypes?

Published Tue 01 August 2017 in SQL > Performance Tuning > Indexing

In this series I explore scenarios that hurt SQL Server performance and show you how to avoid them. Pulled from my collection of "things I didn't know I was doing wrong for years."

Watch this week's video on YouTube

Have you ever encountered a query that runs slowly, even though you've created indexes for it?

There's a few different reasons why this may happen. The one I see most frequently happens in the following scenario.

I'll have an espresso please

Let's say I have a table dbo.CoffeeInventory of coffee beans and prices that I pull from my favorite green coffee bean supplier each week. It looks something like this:

-- Make sure Actual Execution Plan is on
-- Let's see what our data looks like
SELECT * FROM dbo.CoffeeInventory

If you want to follow along, you can get this data set from this GitHub Gist

I want to be able to efficiently query this table and filter on price, so next I create an index like so:

CREATE CLUSTERED INDEX CL_Price ON dbo.CoffeeInventory (Price)

Now, I can write my query to find out what coffee prices are below my willingness to pay:

SELECT Name, Price FROM dbo.CoffeeInventory WHERE Price < 6.75

You would expect this query to be blazing fast and use a clustered index seek, right?

WRONG!

What the heck?

Why is SQL scanning the table when I added a clustered index on the column that I am filtering in my predicate? That's not how it's supposed to work!

Well dear reader, if we look a little bit closer at the table scan operation, we'll notice a little something called CONVERT_IMPLICIT:

CONVERT_IMPLICIT: ruiner of fast queries

What is CONVERT_IMPLICIT doing? Well as it implies, it's having to convert some data as it executes the query (as opposed to me having specified an explicit CAST() or CONVERT() function in my query).

The reason it needs to do this is because I defined my Price column as a VARCHAR(5):

Who put numeric data into a string datatype? Someone who hasn't had their coffee yet today.

In my query however, I'm doing a comparison against a number WHERE Price < 6.75. SQL Server is saying it doesn't know how to compare a string to a number, so it has to convert the VARCHAR string to a NUMERIC(3,2).

This is painful.

Why? Because SQL is performing that implicit conversion to the numeric datatype for every single row in my table. Hence, it can't seek using the index because it ends up having to scan the whole table to convert every record to a number first.

And this doesn't only happen with numbers and string conversion. Microsoft has posted an entire chart detailing what types of data type comparisons will force an implicit conversion:

<https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-type-conversion-database-engine>

That's a lot of orange circles/implicit conversions!

How can I query my coffee faster?

Well in this scenario, we have two options.

Fix the datatype of our table to align with the data actually being stored in this (data stewards love this).
Not cause SQL Server to convert every row in the column.

Number 1 above is self-explanatory, and the better option if you can do it. However, if you aren't able to modify the column type, you are better off writing your query like this:

SELECT Name, Price FROM dbo.CoffeeInventory WHERE Price < '6.75'

d03ed-1uzge0e3lizkhtodyonsfmg

Since we do a comparison of equivalent datatypes, SQL Server doesn't need to do any conversions and our index gets used. Woo-hoo!

What about the rest of my server?

Remember that chart above? There are a lot of different data comparisons that can force a painful column side implicit conversion by SQL Server.

Fortunately, Jonathan Kehayias has written a great query that helps you find column side implicit conversions by querying the plan cache. Running his query is a great way to identify most of the implicit conversions happening in your queries so you can go back and fix them — and then rejoice in your improved query performance!

One SSMS Trick That Will Make You a Faster Query Builder

Published Tue 25 July 2017 in SQL > SSMS

"17/365: i could be your magician" by Jin is licensed under CC BY 2.0

Watch this week's video on YouTube

Here's the scenario: you copy and paste some code into a query you are building. A few minutes later you need that same snippet again, but you've already copied and pasted something else onto the clipboard.

The next five minutes of your life are spent searching across the twenty query editor tabs you have open looking for that original piece of code.

Sound familiar?

THERE'S A BETTER WAY!

Copying and pasting is a feature that's available in nearly every text editor ("nearly" — anyone remember the days before iOS had a clipboard?).

However, SQL Server Management Studio goes above and beyond the regular copy and paste feature set — it has a clipboard ring.

What's a clipboard ring you ask?

ec575-1vgzb1j34ahgunbqofrgora

The clipboard ring let's you cycle through the last 20 things you copied onto your clipboard when you go to paste in SSMS. It can be accessed in the Edit menu (like in the screenshot above) or by using the keyboard shortcut CTRL + SHIFT + V.

Let's say you have the following queries:

----------------- Query 1 --------------------------
SELECT FruitId FROM dbo.Fruits WHERE Name = 'Apple'
----------------- Query 2 --------------------------
SELECT FruitId FROM dbo.Fruits WHERE Name = 'Banana'
----------------- Query 3 --------------------------
SELECT FruitId FROM dbo.Fruits WHERE Name = 'Orange'

And let's pretend you want to copy all of the fruit names into the IN statement of this query:

SELECT FruitId FROM dbo.Fruit WHERE Name IN ()

Instead of copying and pasting each fruit separately, you can batch your copies together and then paste them from the clipboard ring into your IN statement at the same time:

5c701-19e4bf0fjmtpji4ky8bz-mq

Use this trick the next time you need to find that snippet of code you used right before heading off to lunch and I guarantee you will be saving yourself tons of time.