Warning! Are your queries vulnerable to SQL injection?

"Medicine 1" by marosh is licensed under CC BY-NC-ND 2.0.

Watch this week's video on YouTube

Looking for a script to find possible SQL injection vulnerabilities on your server? Scroll to the bottom of this post.


OWASP names SQL injection as the #1 vulnerability for web applications. The infamy of this attack has even made its way into the popular XKCD comic.

What is SQL injection?

A SQL query is vulnerable to SQL injection if a user can run a query other than the one that was originally intended.

Sometimes SQL injection is easier to understand with an example. Let's use this table of registered users:

7b901-154irhcolzqmxl_r8nptesa

And then let's create a simple stored procedure that will query that table:

CREATE PROCEDURE dbo.sp_GetFullName
    @ParmUserName varchar(100)
AS
BEGIN
    DECLARE @FullQuery varchar(1000)
    SET @FullQuery = 'SELECT FullName FROM dbo.RegisteredUser WHERE UserName = ''' + @ParmUserName + ''''

    EXEC(@FullQuery);
END

The important thing to note in the query above is that we are generating a dynamic SQL statement; that is, we are building the SQL query string, and then we are executing it.

Imagine this stored procedure is running in order to display a "Welcome !" message in our app — a website visitor types in their@ParmUserName and we execute the stored procedure to return their full name.

Here's our code that calls the stored procedure:

EXEC dbo.sp_GetFullName 'TFly37'

And result:

8b63a-1wvt3ipotzxiffpapprx6wa

Cool. No problems so far.

However, what if our user decides to pass in the following value as their username?

EXEC dbo.sp_GetFullName 'TFly37'' or 1=1 --'

This funny looking parameter value returns this:

87b93-1iyjcd4upimzl747hhell_a

AHHHHHH!!!!

This user just hacked our website and viewed all of the users in our table.

In this specific case only our user's full names were breached, but in other instances it's just as easy for more sensitive data like passwords, social security numbers, and bank account numbers to be revealed as well (If you are looking for some fun, search for "SQL injection" on the Have I been pwned? list of Pwned websites to see all of the companies that aren't doing a good job protecting your privacy).

So how did that example of SQL injection actually work?

Since our stored procedure executes a dynamically generated query, let's look at what that generated query actually looks like for both of the parameters that were passed in:

10bda-1trn-ptbppptcwnrnsqwv1w

Even though TFly37'' or 1=1-- doesn't look like a intelligible input parameter, when its concatenated into our query it makes sense.

Our malicious user is basically writing their own SQL query, one that will return all of the names of our users instead of just their own. In many instances, the crafty hacker can go a step further and write additional injection code to reveal the contents of the entire user table (or other tables in the database)!

How do I prevent SQL injection?

Simple: don't concatenate unsanitized user input data to your SQL queries.

In this example, this is easy to do: simply rewrite the stored procedure to not use dynamic SQL:

CREATE PROCEDURE dbo.sp_GetFullNameSafe
    @ParmUserName varchar(100)
AS
BEGIN
    SELECT FullName FROM dbo.RegisteredUser WHERE UserName =  @ParmUserName
END

When you don't have dynamic SQL, you can't be affected by SQL injection attempts.

Avoiding dynamic SQL isn't always possible though, so what can we do in those cases? Use sp_executesql:

CREATE PROCEDURE dbo.sp_GetFullNameSafe2
    @ParmUserName varchar(100)
AS
BEGIN
    DECLARE @FullQuery nvarchar(1000)
    SET @FullQuery = N'SELECT FullName FROM dbo.RegisteredUser WHERE UserName = @UserName'

    DECLARE @ParmDefinition nvarchar(100) = N'@UserName varchar(100)';  

    EXEC sp_executesql @FullQuery, @ParmDefinition,  
                      @UserName = @ParmUserName;  
END

sp_executesql allows for parameterization of your dynamic SQL, removing the possibility of unwanted code injection.

Are any of my current SQL queries vulnerable to SQL injection attacks?

So SQL injection is really bad and you don't want to become like Sony or Yahoo. How do we check to see if we have any vulnerable queries on our server?

I wrote the query below to help you start searching. It is not perfect, but it does act as a good starting point for auditing potentially injectable queries:

-- This file tries to find stored procedures and functions that *may* be vulnerable to SQL injection attacks.

-- It works by searching your database for occurences of "+" signs followed by "@", indicating that SQL parameters
-- might be getting concatenated to a dynamic SQL string.  It also checks for the existence of 'EXEC' to see if any
-- strings are being executed.

-- Not every result returned will be susceptible to SQL injection, however they should all be examined to see if they are vulnerable.

-- Originally fromn: https://github.com/bertwagner/SQLServer/blob/master/SQL%20Injection%20Vulnerabilities.sql

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

SELECT
    ROUTINE_CATALOG,
    ROUTINE_SCHEMA,
    ROUTINE_NAME,
    ROUTINE_TYPE,
    ROUTINE_DEFINITION
FROM
    INFORMATION_SCHEMA.ROUTINES
WHERE
    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(ROUTINE_DEFINITION,CHAR(0),''),CHAR(9),''),CHAR(10),''),CHAR(11),''),CHAR(12),''),CHAR(13),''),CHAR(14),''),CHAR(160),''),' ','')
        LIKE '%+@%'
    AND 
    ( -- Only if executes a dynamic string
        ROUTINE_DEFINITION LIKE '%EXEC(%'
        OR ROUTINE_DEFINITION LIKE '%EXECUTE%'
        OR ROUTINE_DEFINITION LIKE '%sp_executesql%'
    )

I've found this script useful for myself, however if you find any issues with it please let me know, thanks!

How to Eliminate Ugly Nested REPLACE() Functions

"On white: Who you really are" by James Jordan is licensed under CC BY-ND 2.0

Watch this week's video on YouTube

How many times have you had to transform some column value and ended up stacking several nested SQL REPLACE() functions like this?

-- Input: Red, Blue, Green
-- Output: RGB
SELECT 
    REPLACE(
    REPLACE(
    REPLACE(
    REPLACE(c.Colors,'Red','R')
    ,'Green','G')
    ,'Blue','B')
    ,', ','') AS Colors
FROM
    (SELECT 'Red, Green, Blue' AS Colors) c

Ugly right? And that's after careful formatting to try and make it look readable. I could have just left it as:

-- Input: Red, Blue, Green
-- Output: RGB
SELECT REPLACE(REPLACE(REPLACE(REPLACE(c.Colors,'Red','R'),'Green','G'),'Blue','B'),', ','') AS Colors
FROM
    (SELECT 'Red, Green, Blue' AS Colors) c

Here we only have 4 nested REPLACE functions. My shameful record is 29. I'm not proud of it, but sometimes it's the only way to get things done.

Not only are these nested REPLACE() functions difficult to write, but they are difficult to read too.

Instead of suffering through all of that ugly nesting, what you can do instead is use CROSS APPLY:

-- Input: Red, Blue, Green
-- Output: RGB
SELECT 
    s.Colors
FROM
    (SELECT 'Red, Green, Blue' AS Colors) c
    CROSS APPLY (SELECT REPLACE(c.Colors,'Red','R') AS Colors) r
    CROSS APPLY (SELECT REPLACE(r.Colors,'Green','G') AS Colors) g
    CROSS APPLY (SELECT REPLACE(g.Colors,'Blue','B') AS Colors) b
    CROSS APPLY (SELECT REPLACE(b.Colors,', ','') AS Colors) s

Technically the CROSS APPLY solution uses more characters. But it is infinitely more readable.

And the server? The server doesn't care about the additional characters —it still compiles it down to the same 1s and 0s:

a78b7-1grof8vly9hnzts5u2jh4gq

So next time you have to nest several REPLACE() functions (or any other string functions), do yourself a favor and make it more readable by using CROSS APPLY instead. Your future self will thank you.

Are your indexes being thwarted by mismatched datatypes?

In this series I explore scenarios that hurt SQL Server performance and show you how to avoid them. Pulled from my collection of "things I didn't know I was doing wrong for years."


Watch this week's video on YouTube

Have you ever encountered a query that runs slowly, even though you've created indexes for it?

There's a few different reasons why this may happen. The one I see most frequently happens in the following scenario.

I'll have an espresso please

Let's say I have a table dbo.CoffeeInventory of coffee beans and prices that I pull from my favorite green coffee bean supplier each week. It looks something like this:

-- Make sure Actual Execution Plan is on
-- Let's see what our data looks like
SELECT * FROM dbo.CoffeeInventory

If you want to follow along, you can get this data set from this GitHub Gist

I want to be able to efficiently query this table and filter on price, so next I create an index like so:

CREATE CLUSTERED INDEX CL_Price ON dbo.CoffeeInventory (Price)

Now, I can write my query to find out what coffee prices are below my willingness to pay:

SELECT Name, Price FROM dbo.CoffeeInventory WHERE Price < 6.75

You would expect this query to be blazing fast and use a clustered index seek, right?

WRONG!

What the heck?

Why is SQL scanning the table when I added a clustered index on the column that I am filtering in my predicate? That's not how it's supposed to work!

Well dear reader, if we look a little bit closer at the table scan operation, we'll notice a little something called CONVERT_IMPLICIT:

CONVERT_IMPLICIT: ruiner of fast queries

What is CONVERT_IMPLICIT doing? Well as it implies, it's having to convert some data as it executes the query (as opposed to me having specified an explicit CAST() or CONVERT() function in my query).

The reason it needs to do this is because I defined my Price column as a VARCHAR(5):

Who put numeric data into a string datatype? Someone who hasn't had their coffee yet today.

In my query however, I'm doing a comparison against a number WHERE Price < 6.75. SQL Server is saying it doesn't know how to compare a string to a number, so it has to convert the VARCHAR string to a NUMERIC(3,2).

This is painful.

Why? Because SQL is performing that implicit conversion to the numeric datatype for every single row in my table. Hence, it can't seek using the index because it ends up having to scan the whole table to convert every record to a number first.

And this doesn't only happen with numbers and string conversion. Microsoft has posted an entire chart detailing what types of data type comparisons will force an implicit conversion:

<https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-type-conversion-database-engine>

That's a lot of orange circles/implicit conversions!

How can I query my coffee faster?

Well in this scenario, we have two options.

  1. Fix the datatype of our table to align with the data actually being stored in this (data stewards love this).
  2. Not cause SQL Server to convert every row in the column.

Number 1 above is self-explanatory, and the better option if you can do it. However, if you aren't able to modify the column type, you are better off writing your query like this:

SELECT Name, Price FROM dbo.CoffeeInventory WHERE Price < '6.75'

d03ed-1uzge0e3lizkhtodyonsfmg

Since we do a comparison of equivalent datatypes, SQL Server doesn't need to do any conversions and our index gets used. Woo-hoo!

What about the rest of my server?

Remember that chart above? There are a lot of different data comparisons that can force a painful column side implicit conversion by SQL Server.

Fortunately, Jonathan Kehayias has written a great query that helps you find column side implicit conversions by querying the plan cache. Running his query is a great way to identify most of the implicit conversions happening in your queries so you can go back and fix them — and then rejoice in your improved query performance!

My Most Embarrassing SQL Moment

T-SQL Tuesday #92: Lessons Learned the Hard Way

55ae8-1lh0mvkliatliiikt0vlyow

This post is a response to this month's T-SQL Tuesday prompt. T-SQL Tuesday was created by Adam Machanic and is a way for SQL users to share ideas about interesting topics. This month's topic is Lessons Learned the Hard Way.


Watch this week's video on YouTube

"Is this your query that's killing the server?"

It was my first week on the job and I was learning to query one of our major databases.

Up until that point, my SQL experience was limited to working on a *tiny* e-commerce database. Query performance was never something I had to deal with because any query I wrote, no matter how poorly written, would always execute quickly.

This new database I was working on though had tables with a billion+ rows. I should have been more conscious about how I was writing my joins and filtering my records, but I wasn't. I wrote my query and executed it in SQL Server Management Studio.

About 20 minutes into my query's execution, I received an email from my new DBA, and it looked something like this:

Uhh, there might be a problem here

"Is this your query that's killing the server?"

Oops.

I don't think my mouse ever moved to the stop execution button as quickly as it did that moment.

I was incredibly embarrassed to have brought our production server to a crawl. I was also incredibly embarrassed to have had my first interaction with my new DBA be about a query that created major problems for him.

Although there were no long-term damages from my server-crushing query, it was a scenario that I definitely didn't want to relive again in the future.

Next time: don't do that again

Obviously, this was an experience where I learned that maybe I shouldn't write queries against unfamiliar data in production.

  • I should have been practicing on a dev database.
  • I should have looked at table meta data and made sure I understood relationships between tables better.
  • I should have done some more preliminary querying with more restrictive filters to be able to catch performance problems earlier on with smaller result sets.
  • I should have examined what indexes were available and made sure I was attempting to use them.
  • I should have used a (NOLOCK) if I absolutely had to test on the production data so that at the very least I wouldn't prevent the high transaction ETLs from modifying data in that database.

All of those "should haves" quickly became my checklist for what to do before running any query in an unfamiliar environment.

I've still written plenty of ugly and inefficient queries since then, however none of them ever caused me to bring the SQL server to a halt like I did in my first week. That was one lesson that I learned the hard way.

How to Put SQL Column Names Onto Multiple Lines in SSMS

A few keystrokes and BAM! A mostly formatted query

SQL in 60 Seconds is a series where I share SQL tips and tricks that you can learn and start using in less than a minute.

Watch this week's video on YouTube

Have you ever copied and pasted a query into SQL Server Management Studio and been annoyed that the list of column names in the SELECT statement were all on one line?

There are 30 columns here. Ugh.

SELECT Col1, Col2, Col3,  Col4, Col5,Col6,Col7, Col8, Col9, Col10,Col11,Col12,Col13,Col14,Col15,Col16,Col17,Col18,Col19,Col20,Col21,Col22,Col23,Col24,Col25,Col26,Col27,Col28,Col29,Col30 FROM dbo.MyTable

You can make the query easier to read by putting each column name onto its own line.

Simply open the Find and Replace window (CTRL + H) and type in ,(:Wh)* for the Find value and ,nt for the Replace value (in some versions of SSMS you may have better luck using ,(:Wh|t| )* in the Find field). Make sure "Use Regular Expressions" is checked and press Replace All:

Make sure the regular expression icon/box is checked

A few keystrokes and BAM! A mostly formatted query

The magic you just used is a Regular Expression, and Microsoft has its own flavor used in SSMS and Visual Studio. Basically, we found text that

  • began with a comma (,)
  • followed by any whitespace (:Wh) (line break, tab, space, etc…)
  • (in newer versions of SSMS we add |t| to indicate or tab or space)
  • and replaced it with a comma (,) and a new line (n) and tab (t).

Sure, this trick isn't going to give you the same output as if you used a proper SQL formatter, but this technique is free and built straight into SSMS.