4 Reasons To Avoid VARCHAR(8000)

Published on: 2019-03-19

Watch this week’s episode on YouTube

When first learning SQL Server, I had the brilliant idea of defining every column on my table as VARCHAR(8000). After all, the VARCHAR datatype only consumes as much storage space as the string itself (plus two bytes), so even if I defined the max length of a column as 8000, I wouldn’t be wasting storage space if my data was smaller than that.

My development life would be easier too since I wouldn’t have to think about the kind of data I was storing in my columns; I could define everything as VARCHAR(8000) once and never have to go back to make any changes. Brilliant!

While I was correct about not wasting storage space, it turns out the idea of making every column VARCHAR(8000) is a terrible idea.

Continue reading “4 Reasons To Avoid VARCHAR(8000)”

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

COUNT, DISTINCT, and NULLs

Published on: 2019-02-19

Watch this week’s episode on YouTube.

One thing I see fairly often (and am occasionally guilty of myself) is using COUNT(DISTINCT) and DISTINCT interchangeably to get an idea of the number of unique values in a column.

While they will sometimes give you the same results, depending on the data the two methods may not always be interchangeable.

Let’s start off with some test data. Important to note are the duplicate values, including the NULLs:

DROP TABLE IF EXISTS ##TestData;
CREATE TABLE ##TestData (Id int identity, Col1 char(1) NULL); 
INSERT INTO ##TestData VALUES ('A'); 
INSERT INTO ##TestData VALUES ('A'); 
INSERT INTO ##TestData VALUES ('B'); 
INSERT INTO ##TestData VALUES ('B');
INSERT INTO ##TestData VALUES (NULL); 
INSERT INTO ##TestData VALUES (NULL); 
CREATE CLUSTERED INDEX CL_Id ON ##TestData (Col1); 

If you want to know how many unique values are in Col1, you might write something like this:

SELECT COUNT(DISTINCT Col1) 
FROM ##TestData 

Two distinct values great! Except…weren’t there some NULLs in there? If we want to see the actual values instead of just seeing the count:

SELECT DISTINCT Col1 
FROM ##TestData 

Interesting, when doing a plain DISTINCT we see there are three unique values, but in our previous query when we wrote COUNT(DISTINCT Col1) a count of two was returned.

And while the SQL Server documentation specifies that DISTINCT will include nulls while COUNT(DISTINCT) will not, this is not something that many people find intuitive.

Viewing and COUNTing the NULLs

Sometimes we might have to do the opposite of what the default functionality does when using DISTINCT and COUNT functions.

For example, viewing the unique values in a column and not including the nulls is pretty straightforward:

SELECT DISTINCT 
    Col1 
FROM 
    ##TestData 
WHERE  
    Col1 IS NOT NULL 

Getting the opposite effect of returning a COUNT that includes the NULL values is a little more complicated. One thing we can try to do is COUNT all of our DISTINCT non-null values and then combine it with a COUNT DISTINCT for our NULL values:

select  COUNT(DISTINCT Col1) + COUNT(DISTINCT CASE WHEN Col1 IS NULL THEN 1 END)
from    ##TestData;

While this logic is easy to interpret, it forces us to read our column of data twice, once for each COUNT – not very efficient on larger sets of data:

Another thing we can try is to put in a placeholder value (that doesn’t exist elsewhere in the column’s data) so that COUNT will include it in its calculation:

SELECT 
    /* ~~~ will never exist in our data */
    COUNT(DISTINCT ISNULL(Col1,'~~~')) 
FROM 
    ##TestData 

The ISNULL here functions the same as the CASE statement in our first attempt, without having to read the table twice. However, that Compute Scalar occurring to the left of our Clustered Index Scan will start to become painful as our data size increases since SQL Server will need to check each and every row and convert any NULLs it finds. Not to mention after computing all of those ~~~ values, SQL Server needs to re-sort the data to be able to find the DISTINCT values.

That leads us to a final attempt: using a DISTINCT in a derived table (to return our NULL) and then taking a count of that:

SELECT COUNT(*)  
FROM (SELECT DISTINCT Col1 FROM ##TestData) v 

This last option eliminates the Compute Scalar and extra sort. While there might be even better options out there for accomplishing the same task, at this point I’m pretty happy with how this will perform.

What’s the Point?

SQL Server’s documentation says that COUNT(*) returns items in a group while COUNT(Col1) return non nulls in the group.

/* returns items in the group.  Includes nulls */
SELECT COUNT(*) 
FROM ##TestData;

/* returns non null values in group */
SELECT COUNT(Col1) 
FROM ##TestData;

Because of this, COUNT, DISTINCT, and NULLs have a special relationship with each other that isn’t always as intuitive as many people think.

Whenever using COUNT or DISTINCT, make sure to test with NULLs to make sure SQL Server is handling them like you expect. 

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

The 5 Scariest Moments for a SQL Server Developer

Published on: 2018-10-30

While families and friends are scaring each other this Halloween week with stories of ghosts and ghouls, I thought it’d be way scarier to talk about truncate tables and source control.

Accidental Data Deletion

You’ve spent all morning loading millions of records into your tables.  Some of that data was painstakingly loaded by hand from flat files, and some lookup values you even authored manually because “you’ll only have to do this once.”

You are jumping around between SSMS windows, running this query here and that query there, pressing F5 like there’s no tomorrow. And then right before you press to F5 to finish all of your work…

…you realize that you ran the whole tab worth of code instead of the one statement you meant to highlight.  “Commands completed successfully” flashes in your Message window, and in horror you discover you accidentally ran a truncate table statement!

Ok so this is pretty bad.

You know backups are being taken but…can the DBA restore the most recent data you spent all morning working on?  You tell her about your misfortune (“…yeah one of my coworkers accidentally deleted the data…”) and hold your breath.

After a few minutes she gets back to you saying everything is back up and working.  Saved by good backup practices…this time!

Problems With code You Inherited

You receive an instant message from your DBA.

DBA: “Hey <your name here>, how are you doing?

“You: “Good how are you?”

DBA: “Look, the instance XYZ has been at 100% CPU utilization for the past hour.  I see you have that has been running that whole time….and it’s using a scalar function.

You: “…”

You: “Sorry.  I am just running this job for Bill since he left a few weeks ago.  Obviously there are some problems with the way he was writing some of this queries and I haven’t had a chance to look them over yet.”

DBA: “That’s ok.  I know you would never write a scalar function that totally destroys the server.  Anyway, I was able to easily rewrite the query into a table-function that should run a lot more efficiently.  Is it ok if I kill your original query and run this new one instead?”

You: “Yeah sure, that’d be great.  Thank you!”

Missing Index Details

You’ve been having a good day: no outages, no annoying customer requests, and the only thing keeping you from going home is to finish tuning the query you are working on.

You execute your query after making all necessary changes and…nothing.  The executing query icon just keeps spinning, and spinning, and spinning…

You go grab some water, and by that time the query has finished.  You switch over to the Execution plan tab and…you see SQL Server’s Missing Index hint recommending that you create an index.

Now don’t be confused, this text looks green and friendly.  You might be tempted to follow the recommendation and add the missing index it recommends – but don’t! 

That siren song of easy query tuning will steer you straight into the rocks known as unnecessary index maintenance. 

Just before you execute Microsoft’s missing index details script, you realize that if you add some include columns and reorder a couple of the key columns, you’ll have an index that will be applicable to all of your table’s queries.

You avoid getting ship wrecked and create an index that useful for more than just this one query – phew!

Email Subjects That Read: “Does anyone know why none of my queries are running?”

Last night’s data load failed.  “No big deal,” you assure yourself. “I’ll just run it now – no one will know the difference.”

You kick-off your ETL queries and decide to go make some oatmeal.  At your office’s kitchen you run into a coworker you haven’t seen in a while, and you get to talking about work, life, that weekend’s football game that was lost in overtime…that’s been happening a lot this season, huh?

On the way back to your desk, one of your customers stops you to ask some questions, your manager needs to discuss project planning with you, a new employee is introduced, etc…

By this point the office is busy with activity.  You finally make it back to your desk MUCH later than you originally intended.  What were you doing again?

You can’t remember so you decide to check your email.  You notice a chain emails asking “Does anyone know what’s going on with server ABC?” Things are running so SLOW!  I can’t run even the simplest queries!”

OH CRAP you think.  You look at your load process queries and they are still running with no end in sight.  Do you kill the queries and wait for the hour-long rollback to complete or…

…just then your queries finish.  A minute later you receive another email saying “seems like everything is working fine now.” 

Another bullet dodged.

Where Did My Code Go?

Working on this team project has been a fun experience for you – responsibilities have been shared and you and your coworkers have been delivering on deadlines ahead of schedule.  You’ve never been able to write queries and make progress to the end product as quickly as this.

You sit down to start working and remember that you had to debug a query from yesterday afternoon.  You go and open the stored procedure in dev and…… it’s not the same query anymore.

Sure, parts of it are the same, but there are some extra parameters and some of the logic has changed.  What the heck…?

“Did anyone mess with USP_GetProducts since yesterday?” you ask your alley of teammates.

“Uhh…I updated it this morning to make it work for my business logic requirement,” said intern Sam.  “Is something wrong?”

“Well, it’s totally different from the way I had it when I left yesterday evening.  I bet we got out of sync and the merge conflict wasn’t resolved correctly” you say.

“Oh yeah, I got some message about merging.  I just clicked the ‘keep my version’ button,” says Sam.

“No big deal,” you say with confidence.  “I can pull my version out of source control.  Come on over and I can show you how we can merge both of our queries together.”

Source control saves the day.

Happy Halloween.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!