Warning! Are your queries vulnerable to SQL injection?

Published on: 2017-08-29

Medicine 1” by marosh is licensed under CC BY-NC-ND 2.0.

Looking for a script to find possible SQL injection vulnerabilities on your server? Scroll to the bottom of this post.

Watch this content on YouTube.


OWASP names SQL injection as the #1 vulnerability for web applications. The infamy of this attack has even made its way into the popular XKCD comic.

What is SQL injection?

A SQL query is vulnerable to SQL injection if a user can run a query other than the one that was originally intended.

Sometimes SQL injection is easier to understand with an example. Let’s use this table of registered users:

And then let’s create a simple stored procedure that will query that table:

CREATE PROCEDURE dbo.sp_GetFullName
	@ParmUserName varchar(100)
AS
BEGIN
	DECLARE @FullQuery varchar(1000)
	SET @FullQuery = 'SELECT FullName FROM dbo.RegisteredUser WHERE UserName = ''' + @ParmUserName + ''''

	EXEC(@FullQuery);
END

The important thing to note in the query above is that we are generating a dynamic SQL statement; that is, we are building the SQL query string, and then we are executing it.

Imagine this stored procedure is running in order to display a “Welcome <Full Name>!” message in our app — a website visitor types in their@ParmUserName and we execute the stored procedure to return their full name.

Here’s our code that calls the stored procedure:

EXEC dbo.sp_GetFullName 'TFly37'

And result:

Cool. No problems so far.

However, what if our user decides to pass in the following value as their username?

EXEC dbo.sp_GetFullName 'TFly37'' or 1=1 --'

This funny looking parameter value returns this:

AHHHHHH!!!!

This user just hacked our website and viewed all of the users in our table.

In this specific case only our user’s full names were breached, but in other instances it’s just as easy for more sensitive data like passwords, social security numbers, and bank account numbers to be revealed as well (If you are looking for some fun, search for “SQL injection” on the Have I been pwned? list of Pwned websites to see all of the companies that aren’t doing a good job protecting your privacy).

So how did that example of SQL injection actually work?

Since our stored procedure executes a dynamically generated query, let’s look at what that generated query actually looks like for both of the parameters that were passed in:

Even though TFly37'' or 1=1-- doesn’t look like a intelligible input parameter, when its concatenated into our query it makes sense.

Our malicious user is basically writing their own SQL query, one that will return all of the names of our users instead of just their own. In many instances, the crafty hacker can go a step further and write additional injection code to reveal the contents of the entire user table (or other tables in the database)!

How do I prevent SQL injection?

Simple: don’t concatenate unsanitized user input data to your SQL queries.

In this example, this is easy to do: simply rewrite the stored procedure to not use dynamic SQL:

CREATE PROCEDURE dbo.sp_GetFullNameSafe
	@ParmUserName varchar(100)
AS
BEGIN
	SELECT FullName FROM dbo.RegisteredUser WHERE UserName =  @ParmUserName
END

When you don’t have dynamic SQL, you can’t be affected by SQL injection attempts.

Avoiding dynamic SQL isn’t always possible though, so what can we do in those cases? Use sp_executesql:

CREATE PROCEDURE dbo.sp_GetFullNameSafe2
	@ParmUserName varchar(100)
AS
BEGIN
	DECLARE @FullQuery nvarchar(1000)
	SET @FullQuery = N'SELECT FullName FROM dbo.RegisteredUser WHERE UserName = @UserName'

	DECLARE @ParmDefinition nvarchar(100) = N'@UserName varchar(100)';  
  
	EXEC sp_executesql @FullQuery, @ParmDefinition,  
                      @UserName = @ParmUserName;  
END

sp_executesql allows for parameterization of your dynamic SQL, removing the possibility of unwanted code injection.

Are any of my current SQL queries vulnerable to SQL injection attacks?

So SQL injection is really bad and you don’t want to become like Sony or Yahoo. How do we check to see if we have any vulnerable queries on our server?

I wrote the query below to help you start searching. It is not perfect, but it does act as a good starting point for auditing potentially injectable queries:

-- This file tries to find stored procedures and functions that *may* be vulnerable to SQL injection attacks.

-- It works by searching your database for occurences of "+" signs followed by "@", indicating that SQL parameters
-- might be getting concatenated to a dynamic SQL string.  It also checks for the existence of 'EXEC' to see if any
-- strings are being executed.

-- Not every result returned will be susceptible to SQL injection, however they should all be examined to see if they are vulnerable.

-- Originally fromn: https://github.com/bertwagner/SQLServer/blob/master/SQL%20Injection%20Vulnerabilities.sql

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

SELECT
    ROUTINE_CATALOG,
    ROUTINE_SCHEMA,
    ROUTINE_NAME,
    ROUTINE_TYPE,
    ROUTINE_DEFINITION
FROM
    INFORMATION_SCHEMA.ROUTINES
WHERE
	REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(ROUTINE_DEFINITION,CHAR(0),''),CHAR(9),''),CHAR(10),''),CHAR(11),''),CHAR(12),''),CHAR(13),''),CHAR(14),''),CHAR(160),''),' ','')
		LIKE '%+@%'
	AND	
	( -- Only if executes a dynamic string
		ROUTINE_DEFINITION LIKE '%EXEC(%'
		OR ROUTINE_DEFINITION LIKE '%EXECUTE%'
		OR ROUTINE_DEFINITION LIKE '%sp_executesql%'
	)

I’ve found this script useful for myself, however if you find any issues with it please let me know, thanks!

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

How to Search and Destroy Non-SARGable Queries on Your Server

Published on: 2017-08-22

Unexpected SQL Server Performance Killers #3

Photo by Cibi Chakravarthi on Unsplash

In this series I explore scenarios that hurt SQL Server performance and show you how to avoid them. Pulled from my collection of “things I didn’t know I was doing wrong for years.”

Watch this post on YouTube.


Looking for a script to find non-SARGable queries on your server? Scroll to the bottom of this post.

What is a “SARGable” query?

Just because you add an index to your table doesn’t mean you get immediate performance improvement. A query running against that table needs to be written in such a way that it actually takes advantage of that index.

SARGable, or “Search Argument-able”, queries therefore are queries that are capable of utilizing indexes.

Examples please!

Okay let’s see some examples of SARGable and non-SARGable queries using my favorite beverage data.

There are non-clustered indexes on the Name and CreateDate columns

First, let’s look at a non-SARGable query:

SELECT Name
FROM dbo.CoffeeInventory
WHERE CONVERT(CHAR(10),CreateDate,121)  = '2017-08-19'

Although this query correctly filters our rows to a specific date, it does so with this lousy execution plan:

SQL Server has to perform an Index Scan, or in other words has to check every single page of this index, to find our ‘2017–08–19’ date value.

SQL Server does this because it can’t immediately look at the value in the index and see if it is equal to the ‘2017–08–19’ date we supplied — we told it first to convert every value in our column/index to a CHAR(10) date string so that it can be compared as a string.

Since the SQL Server has to first convert every single date in our column/index to a CHAR(10) string, that means it ends up having to read every single page of our index to do so.

The better option here would be to leave the column/index value as a datetime2 datatype and instead convert the right hand of the operand to a datetime2:

SELECT Name
FROM dbo.CoffeeInventory
WHERE CreateDate = CAST('2017-08-19' AS datetime2)

Alternatively, SQL Server is smart enough to do this conversion implicitly for us if we just leave our ‘2017–08–19’ date as a string:

SELECT Name
FROM dbo.CoffeeInventory
WHERE CreateDate = '2017-08-19'

In this scenario SQL gives us an Index Seek because it doesn’t have to modify any values in the column/index in order to be able to compare it to the datetime2 value that ‘2017–08–19’ got converted to.

This means SQL only has to read what it needs to output to the results. Much more efficient.

One more example

Based on the last example we can assume that any function, explicit or implicit, that is running on the column side of an operator will result in a query that cannot make use of index seeks, making it non-SARGable.

That means that instead of doing something non-SARGable like this:

SELECT Name, CreateDate
FROM dbo.CoffeeInventory
WHERE DAY(CreateDate)  = 19

We want to make it SARGable by doing this instead:

SELECT Name, CreateDate
FROM dbo.CoffeeInventory
WHERE 
  CreateDate  >= '2017-08-19 00:00:00' 
  AND CreateDate < '2017-08-20 00:00:00'

In short, keep in mind whether SQL Server will have to modify the data in a column/index in order to compare it — if it does, your query probably isn’t SARGable and you are going to end up scanning instead of seeking.

OK, non-SARGable queries are bad…how do I check if I have any on my server?

The script below looks at cached query plans and searches them for any table or index scans. Next, it looks for scalar operators, and if it finds any it means we have ourselves a non-SARGable query. The fix is then to rewrite the query to be SARGable or add a missing index.

-- From https://github.com/bertwagner/SQLServer/blob/master/Non-SARGable%20Execution%20Plans.sql
-- This script will check the execution plan cache for any queries that are non-SARGable.
-- It does this by finding table and index scans that contain a scalar operators

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
 
DECLARE @dbname SYSNAME
SET @dbname = QUOTENAME(DB_NAME());
 
WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')

SELECT
   stmt.value('(@StatementText)[1]', 'varchar(max)') AS [Query],
   query_plan AS [QueryPlan],
   sc.value('(.//Identifier/ColumnReference/@Schema)[1]', 'varchar(128)') AS [Schema], 
   sc.value('(.//Identifier/ColumnReference/@Table)[1]', 'varchar(128)') AS [Table], 
   sc.value('(.//Identifier/ColumnReference/@Column)[1]', 'varchar(128)') AS [Column] ,
   CASE WHEN s.exist('.//TableScan') = 1 THEN 'TableScan' ELSE 'IndexScan' END AS [ScanType],
   sc.value('(@ScalarString)[1]', 'varchar(128)') AS [ScalarString]
FROM 
	sys.dm_exec_cached_plans AS cp
	CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
	CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt)
	CROSS APPLY stmt.nodes('.//RelOp[TableScan or IndexScan]') AS scan(s)
	CROSS APPLY s.nodes('.//ScalarOperator') AS scalar(sc)
WHERE
    s.exist('.//ScalarOperator[@ScalarString]!=""') = 1 
    AND sc.exist('.//Identifier/ColumnReference[@Database=sql:variable("@dbname")][@Schema!="[sys]"]') = 1
	AND sc.value('(@ScalarString)[1]', 'varchar(128)') IS NOT NULL

 

I’ve found this script useful for myself, but if you find any issues with it please let me know, thanks!

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

Why Parameter Sniffing Isn’t Always A Bad Thing (But Usually Is)

Published on: 2017-08-08

Unexpected SQL Server Performance Killers #2

Photo by Jakob Owens on Unsplash

In this series I explore scenarios that hurt SQL Server performance and show you how to avoid them. Pulled from my collection of “things I didn’t know I was doing wrong for years.”

Prefer video? Watch this post on YouTube.

Last week we discussed how implicit conversions could be one reason why your meticulously designed indexes aren’t getting used.

Today let’s look at another reason: parameter sniffing.

Here’s the key: Parameter sniffing isn’t always a bad thing.

Most of the time it’s good: it means SQL Server is caching and reusing query plans to make your queries run faster.

Parameter sniffing only becomes a problem when the cached plan isn’t anywhere close to being the optimal plan for given input parameters.

So what’s parameter sniffing?

Let’s start with our table dbo.CoffeeInventory which you can grab from Github.

The key things to know about this table are that:

  1. We have a nonclustered index on our Name column.
  2. The data is not distributed evenly (we’ll see this in a minute)

Now, let’s write a stored procedure that will return a filtered list of coffees in our table, based on the country. Since there is no specific Country column, we’ll write it so it filters on the Name column:

DROP PROCEDURE IF EXISTS dbo.FilterCoffee
GO
CREATE PROCEDURE dbo.FilterCoffee
@ParmCountry varchar(30)
AS
BEGIN
 SELECT Name, Price, Description 
 FROM Sandbox.dbo.CoffeeInventory
 WHERE Name LIKE @ParmCountry + '%'
END
GO

Let’s take a look at parameter sniffing in action, then we’ll take a look at why it happens and how to solve it.

EXEC dbo.FilterCoffee @ParmCountry = 'Costa Rica'
EXEC dbo.FilterCoffee @ParmCountry = 'Ethiopia'

Running the above statement gives us identical execution plans using table scans:

In this case we explicitly specified the parameter @ParmCountry. Sometimes SQL will parameterize simple queries on its own.

That’s weird. We have two query executions, they are using the same plan, and neither plan is using our nonclustered index on Name!

Let’s step back and try again. First, clear the query plan cache for this stored procedure:

DECLARE @cache_plan_handle varbinary(44)
SELECT @cache_plan_handle = c.plan_handle
FROM 
 sys.dm_exec_cached_plans c
 CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) t
WHERE 
 text like 'CREATE%CoffeeInventory%' 
-- Never run DBCC FREEPROCCACHE without a parameter in production unless you want to lose all of your cached plans...
DBCC FREEPROCCACHE(@cache_plan_handle)

Next, execute the same stored procedure with the same parameter values, but this time with the ‘Ethiopia’ parameter value first. Look at the execution plan:

EXEC dbo.FilterCoffee @ParmCountry = 'Ethiopia'
EXEC dbo.FilterCoffee @ParmCountry = 'Costa Rica'

Now our nonclustered index on Name is being utilized. Both queries are still receiving the same (albeit different) plan.

We didn’t change anything with our stored procedure code, only the order that we executed the query with different parameters.

What the heck is going on here!?

This is an example of parameter sniffing. The first time a stored procedure (or query) is ran on SQL server, SQL will generate an execution plan for it and store that plan in the query plan cache:

SELECT
 c.usecounts,
 c.cacheobjtype,
 c.objtype,
 c.plan_handle,
 c.size_in_bytes,
 d.name,
 t.text,
 p.query_plan
FROM 
 sys.dm_exec_cached_plans c
 CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) t
 CROSS APPLY sys.dm_exec_query_plan(c.plan_handle) p
 INNER JOIN sys.databases d
 ON t.dbid = d.database_id
WHERE 
 text like 'CREATE%CoffeeInventory%'

All subsequent executions of that same query will go to the query cache to reuse that same initial query plan — this saves SQL Server time from having to regenerate a new query plan.

Note: A query with different values passed as parameters still counts as the “same query” in the eyes of SQL Server.

In the case of the examples above, the first time the query was executed was with the parameter for “Costa Rica”. Remember when I said this dataset was heavily skewed? Let’s look at some counts:

SELECT 
  LEFT(Name,CHARINDEX(' ',Name)) AS Country, 
  COUNT(*) AS CountryCount 
FROM dbo.CoffeeInventory 
GROUP BY 
  LEFT(Name,CHARINDEX(' ',Name))

“Costa Rica” has more than 10,000 rows in this table, while all other country names are in the single digits.

This means that when we executed our stored procedure for the first time, SQL Server generated an execution plan that used a table scan because it thought this would be the most efficient way to retrieve 10,003 of the 10,052 rows.

This table scan query plan is only optimal for Costa Rica . Passing in any other country name into the stored procedure would return only a handful of records, making it more efficient for SQL Server to use our nonclustered index.

However, since the Costa Rica plan was the first one to run, and therefore is the one that got added to the query plan cache, all other executions ended up using the same table scan execution plan.

After clearing our cached execution plan using DBCC FREEPROCCACHE, we executed our stored procedure again but with ‘Ethiopia’ as our parameter. SQL Server determined that a plan with an index seek is optimal to retrieve only 6 of the 10,052 rows in the table. It then cached that Index Seek plan, which is why the second time around the ‘Costa Rica’ parameter received the execution plan with Index Seek.

Ok, so how do I prevent parameter sniffing?

This question should really be rephrased as “how do I prevent SQL Server from using a sub-optimal plan from the query plan cache?”

Let’s take a look at some of the techniques.

1. Use WITH RECOMPILE or OPTION (RECOMPILE)

We can simply add these query hints to either our EXEC statement:

EXEC dbo.FilterCoffee @ParmCountry = 'Ethiopia' WITH RECOMPILE
EXEC dbo.FilterCoffee @ParmCountry = 'Costa Rica' WITH RECOMPILE

or to our stored procedure itself:

DROP PROCEDURE IF EXISTS dbo.FilterCoffee
GO
CREATE PROCEDURE dbo.FilterCoffee
@ParmCountry varchar(30)
AS
BEGIN
 SELECT Name, Price, Description 
 FROM Sandbox.dbo.CoffeeInventory 
 WHERE Name LIKE @ParmCountry + '%'

 OPTION (RECOMPILE)
END
GO

What the RECOMPILE hint does is force SQL Server to generate a new execution plan every time these queries run.

Using RECOMPILE eliminates our parameter sniffing problem because SQL Server will regenerate the query plan every single time we execute the query.

The disadvantage here is that we lose all benefit from having SQL Server save CPU cycles by caching execution plans.

If your parameter sniffed query is getting ran frequently, RECOMPILE is probably a bad idea because you will encounter a lot of overheard to generate the query plan regularly.

If your parameter sniffed query doesn’t get ran often, or if the query doesn’t run often enough to stay in the query plan cache anyway, then RECOMPILE is a good solution.

2. Use the OPTIMIZE FOR query hint

Another option we have is to add either one of the following hints to our query. One of these would get added to the same location as OPTION (RECOMPILE) did in the above stored procedure:

OPTION (OPTIMIZE FOR (@ParmCountry UNKNOWN))

or

OPTION (OPTIMIZE FOR (@ParmCountry = 'Ethiopia'))

OPTIMIZE FOR UNKNOWN will use a query plan that’s generated from the average distribution stats for that column/index. Often times it results in an average or bad execution plan so I don’t like using it.

OPTIMIZE FOR VALUE creates a plan using whatever parameter value specified. This is great if you know your queries will be retrieving data that’s optimized for the value you specified most of the time.

In our examples above, if we know the value ‘Costa Rica’ is rarely queried, we might optimize for index seeks. Most queries will then run the optimal cached query plan and we’ll only take a hit when ‘Costa Rica’ is queried.

3. IF/ELSE

This solution allows for ultimate flexibility. Basically, you create different stored procedures that are optimized for different values. Those stored procedures have their plans cached, and then an IF/ELSE statement determines which procedure to run for a passed in parameter:

DROP PROCEDURE IF EXISTS dbo.FilterCoffee
GO
CREATE PROCEDURE dbo.FilterCoffee
@ParmCountry varchar(30)
AS
BEGIN
 IF @ParmCountry = 'Costa Rica'
 BEGIN
  EXEC dbo.ScanningStoredProcedure @ParmCountry
 END
 ELSE
 BEGIN
  EXEC dbo.SeekingStoredProcedure @ParmCountry
 END
END
GO

This option is more work (How do you determine what the IF condition should be? What happens more data is added to the table over time and the distribution of data changes?) but will give you the best performance if you want your plans to be cached and be optimal for the data getting passed in.

Conclusion

  1. Parameter sniffing is only bad when your data values are unevenly distributed and cached query plans are not optimal for all values.
  2. SQL Server caches the query plan that is generated from the first run of a query/stored procedure with whatever parameter values were used during that first run.
  3. Using the RECOMPILE hint is a good solution when your queries aren’t getting ran often or aren’t staying in the the query cache most of the time anyway.
  4. The OPTIMIZE FOR hint is good to use when you can specify a value that will generate a query plan that is efficient for most parameter values and are OK with taking a hit for a sub-optimal plan on infrequently queried values.
  5. Using complex logic (like IF/ELSE) will give you ultimate flexibility and performance, but will also be the worst for long term maintenance.

 

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!