Are your indexes being thwarted by mismatched datatypes?

Published Tue 01 August 2017 in SQL > Performance Tuning > Indexing

In this series I explore scenarios that hurt SQL Server performance and show you how to avoid them. Pulled from my collection of "things I didn't know I was doing wrong for years."

Watch this week's video on YouTube

Have you ever encountered a query that runs slowly, even though you've created indexes for it?

There's a few different reasons why this may happen. The one I see most frequently happens in the following scenario.

I'll have an espresso please

Let's say I have a table dbo.CoffeeInventory of coffee beans and prices that I pull from my favorite green coffee bean supplier each week. It looks something like this:

-- Make sure Actual Execution Plan is on
-- Let's see what our data looks like
SELECT * FROM dbo.CoffeeInventory

If you want to follow along, you can get this data set from this GitHub Gist

I want to be able to efficiently query this table and filter on price, so next I create an index like so:

CREATE CLUSTERED INDEX CL_Price ON dbo.CoffeeInventory (Price)

Now, I can write my query to find out what coffee prices are below my willingness to pay:

SELECT Name, Price FROM dbo.CoffeeInventory WHERE Price < 6.75

You would expect this query to be blazing fast and use a clustered index seek, right?

WRONG!

What the heck?

Why is SQL scanning the table when I added a clustered index on the column that I am filtering in my predicate? That's not how it's supposed to work!

Well dear reader, if we look a little bit closer at the table scan operation, we'll notice a little something called CONVERT_IMPLICIT:

CONVERT_IMPLICIT: ruiner of fast queries

What is CONVERT_IMPLICIT doing? Well as it implies, it's having to convert some data as it executes the query (as opposed to me having specified an explicit CAST() or CONVERT() function in my query).

The reason it needs to do this is because I defined my Price column as a VARCHAR(5):

Who put numeric data into a string datatype? Someone who hasn't had their coffee yet today.

In my query however, I'm doing a comparison against a number WHERE Price < 6.75. SQL Server is saying it doesn't know how to compare a string to a number, so it has to convert the VARCHAR string to a NUMERIC(3,2).

This is painful.

Why? Because SQL is performing that implicit conversion to the numeric datatype for every single row in my table. Hence, it can't seek using the index because it ends up having to scan the whole table to convert every record to a number first.

And this doesn't only happen with numbers and string conversion. Microsoft has posted an entire chart detailing what types of data type comparisons will force an implicit conversion:

<https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-type-conversion-database-engine>

That's a lot of orange circles/implicit conversions!

How can I query my coffee faster?

Well in this scenario, we have two options.

Fix the datatype of our table to align with the data actually being stored in this (data stewards love this).
Not cause SQL Server to convert every row in the column.

Number 1 above is self-explanatory, and the better option if you can do it. However, if you aren't able to modify the column type, you are better off writing your query like this:

SELECT Name, Price FROM dbo.CoffeeInventory WHERE Price < '6.75'

d03ed-1uzge0e3lizkhtodyonsfmg

Since we do a comparison of equivalent datatypes, SQL Server doesn't need to do any conversions and our index gets used. Woo-hoo!

What about the rest of my server?

Remember that chart above? There are a lot of different data comparisons that can force a painful column side implicit conversion by SQL Server.

Fortunately, Jonathan Kehayias has written a great query that helps you find column side implicit conversions by querying the plan cache. Running his query is a great way to identify most of the implicit conversions happening in your queries so you can go back and fix them — and then rejoice in your improved query performance!

My Most Embarrassing SQL Moment

Published Tue 11 July 2017 in SQL > TSqlTuesday

T-SQL Tuesday #92: Lessons Learned the Hard Way

55ae8-1lh0mvkliatliiikt0vlyow

This post is a response to this month's T-SQL Tuesday prompt. T-SQL Tuesday was created by Adam Machanic and is a way for SQL users to share ideas about interesting topics. This month's topic is Lessons Learned the Hard Way.

Watch this week's video on YouTube

"Is this your query that's killing the server?"

It was my first week on the job and I was learning to query one of our major databases.

Up until that point, my SQL experience was limited to working on a *tiny* e-commerce database. Query performance was never something I had to deal with because any query I wrote, no matter how poorly written, would always execute quickly.

This new database I was working on though had tables with a billion+ rows. I should have been more conscious about how I was writing my joins and filtering my records, but I wasn't. I wrote my query and executed it in SQL Server Management Studio.

About 20 minutes into my query's execution, I received an email from my new DBA, and it looked something like this:

Uhh, there might be a problem here

"Is this your query that's killing the server?"

Oops.

I don't think my mouse ever moved to the stop execution button as quickly as it did that moment.

I was incredibly embarrassed to have brought our production server to a crawl. I was also incredibly embarrassed to have had my first interaction with my new DBA be about a query that created major problems for him.

Although there were no long-term damages from my server-crushing query, it was a scenario that I definitely didn't want to relive again in the future.

Next time: don't do that again

Obviously, this was an experience where I learned that maybe I shouldn't write queries against unfamiliar data in production.

I should have been practicing on a dev database.
I should have looked at table meta data and made sure I understood relationships between tables better.
I should have done some more preliminary querying with more restrictive filters to be able to catch performance problems earlier on with smaller result sets.
I should have examined what indexes were available and made sure I was attempting to use them.
I should have used a (NOLOCK) if I absolutely had to test on the production data so that at the very least I wouldn't prevent the high transaction ETLs from modifying data in that database.

All of those "should haves" quickly became my checklist for what to do before running any query in an unfamiliar environment.

I've still written plenty of ugly and inefficient queries since then, however none of them ever caused me to bring the SQL server to a halt like I did in my first week. That was one lesson that I learned the hard way.

XML vs JSON Shootout: Which is Superior in SQL Server 2016?

Published Tue 16 May 2017 in SQL > Development > JSON

"A duel is a duel" by Emanuele Rosso is licensed under CC BY-NC-ND 2.0

Watch this week's video on YouTube

Additional performance comparisons available in an updated post.

Starting with the 2016 release, SQL Server offers native JSON support. Although the implementation is not perfect, I am still a huge fan.

Even if a new feature like JSON support is awesome, I am only likely to use it if it is practical and performs better than the alternatives.

Today I want to pit JSON against XML and see which is the better format to use in SQL Server.

Enter XML, SQL's Bad Hombre

Full disclosure: I don't love XML and I also don't love SQL Server's implementation of it.

XML is too wordy (lots of characters wasted on closing tags), it has elements AND attributes (I don't like having to program for two different scenarios), and depending on what language you are programming in, sometimes you need schema files and sometimes you don't.

SQL Server's implementation of XML does have some nice features like a dedicated datatype that reduces storage space and validates syntax, but I find the querying of XML to be clumsy.

All XML grievances aside, I am still willing to use XML if it outperforms JSON. So let's run some test queries!

Is JSON SQL Server's New Sheriff in Town?

Although performance is the final decider in these comparison tests, I think JSON has a head start over XML purely in terms of usability. SQL Server's JSON function signatures are easier to remember and cleaner to write on screen.

The test data I'm using is vehicle year/make/model data from https://github.com/arthurkao/vehicle-make-model-data. Here's what it looks like once I loaded it into a table called dbo.XmlVsJson:

CREATE TABLE dbo.XmlVsJson
(
  Id INT IDENTITY PRIMARY KEY,
  XmlData XML,
  JsonData NVARCHAR(MAX)
)

(The full data query is available in this gist if you want to play along at home)

Data Size

So XML should be larger right? It's got all of those repetitive closing tags?

SELECT
  DATALENGTH(XmlData)/1024.0/1024.0 AS XmlMB,
  DATALENGTH(JsonData)/1024.0/1024.0 AS JsonMB
FROM
  dbo.XmlVsJson

82246-1ra2xqdbn4movjivmlhtnxa

Turns out the XML is actually smaller! How can this be? This is the magic behind the SQL Server XML datatype. SQL doesn't store XML as a giant string; it stores only the XML InfoSet, leading to a reduction in space.

The JSON on the other hand is stored as regular old nvarchar(max) so its full string contents are written to disk. XML wins in this case.

0745a-1liaytu4vkxadqugdbn1j3g

INSERT Performance

So XML is physically storing less data when using the XML data type than JSON in the nvarchar(max) data type, does that mean it will insert faster as well? Here's our query that tries to insert 100 duplicates of the row from our first query:

SET STATISTICS TIME ON

INSERT INTO dbo.XmlVsJson (XmlData)
SELECT XmlData FROM dbo.XmlVsJson 
  CROSS APPLY 
  (
    SELECT DISTINCT number 
    FROM master..spt_values 
    WHERE number BETWEEN 1 AND 100
  )t WHERE Id = 1
GO

INSERT INTO dbo.XmlVsJson (JsonData)
SELECT JsonData FROM dbo.XmlVsJson 
  CROSS APPLY 
  (
    SELECT DISTINCT number 
    FROM master..spt_values 
    WHERE number BETWEEN 1 AND 100
  )t WHERE Id = 1
GO

And the results? Inserting the 100 XML rows took 613ms on my machine, while inserting the 100 JSON rows took 1305ms…XML wins again!

JSON ain't looking too hot. Wait for it…

I'm guessing since the XML data type physically stores less data, it makes sense that it would also write it out to the table faster as well.

CRUD Operations

I'm incredibly impressed by SQL Server's JSON performance when compared to .NET — but how does it compare to XML on SQL Server?

Read

Let's select the fragment for our second car from our XML and JSON:

SELECT t.XmlData.query('/cars/car[2]') 
FROM dbo.XmlVsJson t 
WHERE Id = 1

SELECT JSON_QUERY(t.JsonData, '$.cars[1]') 
FROM dbo.XmlVsJson t 
WHERE Id = 1

91b35-1xixep1a6ebvvn38_vvpavq

Result? JSON wins (at 0ms vs 63ms for XML) when needing to pluck out a fragment from our larger object string.

8a02b-1hzaxjaobi3dxqurwrelf4w

What if we want to grab a specific value instead of a fragment?

SELECT t.XmlData.value('(/cars/car[2]/model)[1]', 'varchar(100)') FROM dbo.XmlVsJson t 
WHERE Id = 1

SELECT JSON_VALUE(t.JsonData, '$.cars[1].model') 
FROM dbo.XmlVsJson t 
WHERE Id = 1

Once again JSON wins with 0ms vs 11ms for XML.

41179-1esau9dtm4uezlxgarzxb1q

If you look at the execution plans for these last two queries, it's easy to see that XML has a lot more to do behind the scenes to retrieve the data:

XML:

327b8-1fgaieclnioqa5-zytrrndq

JSON:

45848-1s9aop29_eiltkvsnszkw9g

Create

We saw above that inserting rows of XML data is faster than inserting rows of JSON, but what if we want to insert new data into the object strings themselves? Here I want to insert the property "mileage" into the first car object:

db0c0-1btrkakuspyaj8iu5lq7wsg

UPDATE t SET XmlData.modify('
insert <mileage>100,000</mileage>
into (/cars/car[1])[1]') 
FROM dbo.XmlVsJson t 
WHERE Id = 1

UPDATE t SET JsonData = JSON_MODIFY(JsonData,
'$.cars[0].mileage','100,000') 
FROM dbo.XmlVsJson t 
WHERE Id = 1

In addition to the cleaner syntax (JSON_MODIFY() is essentially the same as a REPLACE()) the JSON insert runs in 22ms compared to the 206ms for XML. Another JSON win.

c01d2-1kng0gj5jdkk5ujd1qlg5ig

Update

Let's update the mileage properties we just added to have values of 110,000:

UPDATE t SET XmlData.modify('
replace value of (/cars/car[1]/mileage/text())[1]
with     "110,000"') 
FROM dbo.XmlVsJson t
WHERE Id = 1

UPDATE t SET JsonData = JSON_MODIFY(JsonData, '$.cars[0].mileage','110,000') 
FROM dbo.XmlVsJson t
WHERE Id = 1

72c82-1hpv3kbbgv88tzuyozvineg

Result? JSON has the quicker draw and was able to perform this update in 54ms vs XML's 194ms.

c81d3-1fvual_pu3eyahvumnfimqg

Delete

Deleting large string data, a DBA's dream *snicker*.

Let's delete the mileage property, undoing all of that hard work we just did:

UPDATE t SET XmlData.modify('
delete /cars/car[1]/mileage[1]') 
FROM dbo.XmlVsJson t 
WHERE Id = 1

UPDATE t SET JsonData = JSON_MODIFY(JsonData, '$.cars[0].mileage', null) 
FROM dbo.XmlVsJson t 
WHERE Id = 1

JSON doesn't take any time to reload and wins against XML again 50ms to 159ms.

538ba-1ojms_3yjdktnni2a3pqcrg

Read Part 2: Indexes

So above we saw that JSON was faster than XML at reading fragments and properties from a single row of serialized data. But our SQL Server's probably have LOTS of rows of data — how well does indexed data parsing do in our match up?

First let's expand our data — instead of storing all of our car objects in a single field, let's build a new table that has each car on its own row:

(once again, full dataset at GitHub if you are playing along at home)

Now that we have our expanded data in our table, let's add some indexes. The XML datatype in SQL Server has its own types of indexes, while JSON simply needs a computed column with a regular index applied to it.

DROP INDEX IF EXISTS PXML_XmlData ON XmlVsJson2
CREATE PRIMARY XML INDEX PXML_XmlData
ON XmlVsJson2 (XmlData);

ALTER TABLE dbo.XmlVsJson2
ADD MakeComputed AS JSON_VALUE(JsonData, '$.make')
CREATE NONCLUSTERED INDEX IX_JsonData ON dbo.XmlVsJson2 (MakeComputed)

(Note: I also tried adding an XML secondary index for even better performance, but I couldn't get the query engine to use that secondary index on such a basic dataset)

If we try to find all rows that match a predicate:

SELECT Id, XmlData 
FROM dbo.XmlVsJson2 t 
WHERE t.XmlData.exist('/car/make[.="ACURA"]') = 1

SELECT Id, JsonData 
FROM dbo.XmlVsJson2 t 
WHERE JSON_VALUE(t.JsonData, '$.make') = 'ACURA'

XML is able to filter out 96 rows in 200ms and JSON accomplishes the same in 9ms. A final win for JSON.

327d9-1-jtwonmiccgvf25ksnmb3q

Conclusion

If you need to store and manipulate serialized string data in SQL Server, there's no question: JSON is the format of choice. Although JSON's storage size is a little larger than its XML predecessor, SQL Server's JSON functions outperform XML in speed in nearly all cases.

Is there enough performance difference to rewrite all of your old XML code to JSON? Probably not, but every case is different.

One thing is clear: new development should consider taking advantage of SQL Server's new JSON functions.

One SQL Cheat Code For Amazingly Fast JSON Queries

Published Tue 09 May 2017 in SQL > Development > JSON

f836d-17tk3zornsj5nirg7ai5dbq

Watch this week's video on YouTube

Recently I've been working with JSON in SQL Server 2016 a lot.

One of the hesitations many people have with using JSON in SQL Server is that they think that querying it must be really slow — SQL is supposed to excel at relational data, not string parsing right?

It turns out that performance is pretty good with the standalone SQL Server JSON functions. Even better is that it's possible to make queries against JSON data run at ludicrous speeds by using indexes on JSON parsed computed columns. In this post I want to take a look at how SQL is able to parse* with such great performance.

*"Parse" here is actually a lie —it's doing something else behind the scenes. You'll see what I mean, keep reading!

Computed Columns in SQL Server

The only way to get JSON indexes working on SQL server is to use a computed column. A computed column is basically a column that performs a function to calculate its values.

For example, let's say we have a table with some car JSON data in it:

DROP TABLE IF EXISTS dbo.DealerInventory;
CREATE TABLE dbo.DealerInventory
(
  Id int IDENTITY(1,1) PRIMARY KEY,
  Year int,
  JsonData nvarchar(300)
);

INSERT INTO dbo.DealerInventory (Year, JsonData) VALUES (2017, '{ "Make" : "Volkswagen", "Model" : "Golf" }');

INSERT INTO dbo.DealerInventory (Year, JsonData) VALUES (2017, '{ "Make" : "Honda", "Model" : "Civic" }');

INSERT INTO dbo.DealerInventory (Year, JsonData) VALUES (2017, '{ "Make" : "Subaru", "Model" : "Impreza" }');

SELECT * FROM dbo.DealerInventory;

/* Output:
Id    Year     JsonData
----- -------- ---------------------------------------------
1     2017     { "Make" : "Volkswagen", "Model" : "Golf" }
2     2017     { "Make" : "Honda", "Model" : "Civic" }
3     2017     { "Make" : "Subaru", "Model" : "Impreza" }
*/

We can add a new computed column to the table, "Make", which parses and extracts the Make property from each row's JSON string:

ALTER TABLE dbo.DealerInventory
ADD Make AS JSON_VALUE(JsonData, '$.Make');

SELECT * FROM dbo.DealerInventory;

/* Output:
Id Year  JsonData                                    Make
-- ----- ------------------------------------------- ----------
1  2017  { "Make" : "Volkswagen", "Model" : "Golf" } Volkswagen
2  2017  { "Make" : "Honda", "Model" : "Civic" }     Honda
3  2017  { "Make" : "Subaru", "Model" : "Impreza" }  Subaru
*/

By default, the above Make computed column is non-persisted, meaning its values are never stored to the database (persisted computed columns can also be created, but that's a topic for a different time). Instead, every time a query runs against our dbo.DealerInventory table, SQL Server will calculate the value for each row.

The performance of this isn't great — it's essentially a scalar function running for each row of our output :(. However, when you combine a computed column with an index, something interesting happens.

Time to dive in with DBCC Page

DBCC Page is an undocumented SQL Server function that shows what the raw data stored in a SQL page file looks like. Page files are how SQL Server stores its data.

In the rest of this post we'll be looking at how data pages (where the actual table data in SQL is stored) and index pages (where our index data is stored) are affected by non-persisted computed columns — and how they make JSON querying super fast.

First, let's take a look at the existing data we have. We do this by first turning on trace flag 3604 and using DBCC IND to get the page ids of our data. Additional details on the column definitions in DBCC IND and DBCC PAGE can be found in Paul Randal's blog post on the topic.

DBCC TRACEON(3604);

-- "Sandbox" is the name of my database
DBCC IND('Sandbox','dbo.DealerInventory',-1);

55ac9-1qczux6z_us9fzacehamhpa

If you look at the results above, row 2 contains our data page (indicated by PageType = 1) and the PagePID of that page is 305088 (if you are playing along at home, your PagePID is most likely something else). If we then look up that PagePID using DBCC PAGE we get something like this:

DBCC PAGE('Sandbox',1,305088,3) WITH TABLERESULTS

65fdc-1jaa-flg1di52y3t2gohyxg

You can see our three rows of data highlighted in red. The important thing to note here is that our computed column of the parsed "Make" value is truly non-persisted and no where to be found, meaning it has to get generated for every row during query execution.

Now, what if we add an index to our non-persisted computed column and then run DBCC IND again:

CREATE NONCLUSTERED INDEX IX_ParsedMake ON dbo.DealerInventory (Make)

DBCC IND('Sandbox','dbo.DealerInventory',-1);

cab34-1vfyzcblexgrytsqu5miomg

You'll now notice that in addition to data page 305088 (PageType = 1), we also have an index page 305096 (PageType = 2). If we examine both the data page and the index page we see something interesting:

DBCC PAGE('Sandbox',1,305088,3) WITH TABLERESULTS

DBCC PAGE('Sandbox',1,305096,3) WITH TABLERESULTS

Nothing has changed with our data page:

3348f-1ktty-u5w4iha9emrwys9zw

But our index page contains the parsed values for our "Make" column:

1b9ea-16ysi1x2ilwdo7j3tkwowlw

What does this mean? I thought non-persisted computed columns aren't saved to disk!

Exactly right: our non-persisted computed column "Make" isn't saved to the data page on the disk. However if we create an index on our non-persisted computed column, the computed value is persisted on the index page!

This is basically a cheat code for indexing computed columns.

SQL will only compute the "Make" value on a row's insert or update into the table (or during the initial index creation) — all future retrievals of our computed column will come from the pre-computed index page.

This is how SQL is able to parse indexed JSON properties so fast; instead of needing to do a table scan and parsing the JSON data for each row of our table, SQL Server can go look up the pre-parsed values in the index and return the correct data incredibly fast.

Personally, I think this makes JSON that much easier (and practical) to use in SQL Server 2016. Even though we are storing large JSON strings in our database, we can still index individual properties and return results incredibly fast.

How to fix your terribly slow SQL job

Published Tue 02 May 2017 in SQL > Performance Tuning

This is a real-time progress bar for some of my old jobs. They are still running and stuck at 3%.

How many times have you written a program, ETL, analysis job, etc… that seemed like it would never finish running?

Although poor performance can be caused in a multitude of ways, the easiest to fix is by reducing your data in SQL Server instead of your in your programming/ETL/analysis layer (Excel, R, SAS, Python, ..NET, etc…).

SQL is built to handle and process data extremely efficiently. You will usually experience much better performance the more work (data merging, transformations, etc…) you can do to your data on the SQL server. I say "usually" because SQL won't always be faster than a programming language at transforming data, but 9 times out of 10 you can get faster results straight on the SQL Server.

Watch this week's video on YouTube

Let's look at one of my crappy processes

How many of us have ever written a process that does something like this:

1. Write the most basic query possible, something like SELECT * FROM dbo.User

2. Take the output of the above query, load it into Excel/SAS/Python/.NET/etc…

3. Write some code to filter the dataset

4. Write some code to summarize the data, transform columns, etc…

5. Write another SELECT * FROM dbo.Sale against the SQL Server to bring in more data

6. Bring it into Excel/SAS/Python/.NET/etc… and merge it with our original data

7. Repeat steps 3–6 as many times as needed

Some of my earliest PHP and MySQL websites worked exactly like this 😳! The code was slow on my server and users ended up suffering with slow webpage load times.

If the above process even slightly resembles something you've written before, continue reading on…

Why bother learning to transform data in SQL? I already know how to do that stuff in .

Old habits are hard to break, but you do want to make your processes run faster, right? This stuff is all easy, I promise!

Basically, if you are running code similar to above, the reason your job is slow is because you are not optimizing where your work is being performed:

Every time you write SELECT * you probably are bringing back more data than you actually need — you are hurting your performance.
Every time you don't have a WHERE clause, you are hurting your performance.
Every time your process queries the database multiple times (ie. multiple SELECT statements in your job to bring back data), you are hurting your performance.

In case you missed it, not taking the time to filter and reduce your data down as much as possible in your SQL is hurting your performance! Assuming your SQL Server and your programming layer are on different machines, you lose lots of time transferring unnecessary data over the wires (or air) as well as not efficiently using all of the advantages that your SQL server offers.

What's the solution to this inefficient processing?

Process your data on the SQL Server!

If you are not filtering, joining, and transforming your data until your programming layer, you are likely losing valuable SQL performance power and network efficiency. Here are some easy ways to reduce the size of your dataset on the SQL Server to improve performance in your jobs (and make your coworkers envious of your skills)!

SELECT [ColumnName]

If you are using SELECT *, stop!

SELECT * brings back all of the columns on your table, including the ones you don't need. This increases the amount of data sent over the network (which doesn't even get used) as well as increases the amount of data that needs to be read from disk (and storage hardware is usually relatively slow). Not to mention if your table is using indexes, SELECT * most likely causes some of those indexes not to be used as efficiently (or at all) which causes your queries to slow down even further.

But what if you do need all of the columns on a particular table? You still shouldn't use SELECT *! Although there's no performance difference, using SELECT * just means you are taking on technical debt. In the future, when a column gets added or removed from your table, your downstream processes may break because they are now automatically receiving (or no longer receiving) that column. Do you want to have to fix a failing process in the future because its now receiving more data that it was expecting? I don't think so!

JOINs

My inefficient process example above starts with selecting some data and bringing it into my programming environment. The process then runs another query to bring in additional data and joins it to the data from my first query in my programming environment.

This is terrible!

First off, we are breaking the first principle we learned in the SELECT * section above — we are bringing back more data than we need! If we are using INNER JOIN on our two datasets, we most likely are going to be filtering out some data — data we don't need. Joining on the SQL server first will reduce our total dataset size and make our network and disk performance more efficient.

Even if we are doing something like a LEFT or FULL OUTER join where we will be keeping all of the data from one or both of our datasets, it still benefits us to perform this join on the SQL Server. Why you ask? Because the people who built SQL Server have spent hundreds or thousands of hours performance tuning and debugging their joining algorithms. The chances that you will be able to write a more efficient join algorithm is highly unlikely.

And even if you are a programming savant, why reinvent the wheel? Unless your app needs every last microsecond of performance, just use SQL Server for what it's really good at: relational data joining.

WHERE Clauses

Let's say our dbo.User table has 50 thousand rows and our dbo.Sale table has 1 million rows. If your process is only looking for active users and sales from the past month, let's say 2 thousand rows and 22,000 rows respectively, then you are causing SQL to lookup and transfer 95% more rows than your process needs. Not only does it kill network performance, but your program layer then needs to filter out this data, doing extra work that it probably can't do as efficiently as SQL Server.

If instead I would have just added predicates to the SQL WHERE clause like Active=1 and SalesDate >= DATEADD(month, -1, GETDATE()) we would have saved both time and bandwidth.

Aggregate Functions

You know what's better than sending 10,000 rows of data over the network and then summing them up in your programming layer?

Using SQL's SUM() aggregate function to reduce those 10,000 rows to just 1 row before sending it across the network.

SQL aggregate functions take many rows of data and consolidate them down into fewer rows.

SQL's aggregate functions are also flexible enough to use the OVER() clause, allowing for windowed sets within your data — basically allowing you to be even more flexible with how you aggregate your data.

Don't wait until your application layer to summarize parts of your data — do it in your SQL query instead.

Scalar Functions

Although aggregate functions do some serious heavy lifting, scalar functions that run on each row of data aren't anything to laugh at either. Although they won't reduce the number of rows in your output, they can certainly reduce the number of columns you are outputting.

For example, say you have multiple columns of data in your dataset that ultimately need to be combined into a single output column. It's much better to use ISNULL(), COALESCE(), or CASE to combine multiple columns into a single column with logic in your SQL query so less data needs to be transferred later.

Once again, reducing the amount of data you are sending over the network is key to getting faster run times.

XML and JSON Functions

Last but not least, if your process is generating XML or JSON data at some point, consider generating that data on the SQL Server. Now, generating XML and JSON data won't always improve your performance — SQL Server is best at relational tasks and not large string creation — but in many cases, especially with JSON, SQL Server can outperform even the fastest .NET libraries.

If your network is your bottle neck, then it is very possible that SQL can apply complex logic and transform your data into XML or JSON faster on the SQL Server than if you needed to transfer all of that data to another location on the network and handle those transformations in another programming language.

In short: do as much work as possible in SQL

If your SQL queries could be following any of the above techniques and they're not, then fix them…today! Checking each of your queries for any of the above inefficiencies and mitigating them will probably (always test your changes) improve the performance of your applications and processes.

And then it won't feel like your process is taking forever to run.