Optimizing for Ad Hoc Workloads

Watch this week's video on YouTube

The execution plan cache is a great feature: after SQL Server goes through the effort of generating a query plan, SQL Servers saves that plan in the plan cache to be reused again at a later date.

One downside to SQL Server caching almost all plans by default is that some of those plans won't ever get reused. Those single use plans will exist in the plan cache, inefficiently tying up a piece of the server's memory.

Today I want to look at a feature that will keep these one-time use plans out of the plan cache.

Plan Stubs

Instead of filling the execution plan cache with plans that will never get reused, the optimize for ad hoc workloads option will cache a plan stub instead of the full plan. The plan stub is significantly smaller in size and is only replaced with the full execution plan when SQL Server recognizes that the same query has executed multiple times.

This reduces the amount of size one-time queries take up in t he cache, allowing more reusable plans to remain in the cache for longer periods of time.

Enabling this server-level feature is as easy as (a database scoped versions :

sp_configure 'show advanced options',1
GO
reconfigure
GO  
sp_configure 'optimize for ad hoc workloads',1
GO
reconfigure
go

Once enabled you can watch the plan stub take up less space in the cache:

-- Run each of these queries once
DECLARE @Username varchar = 'A'
SELECT UserName 
FROM IndexDemos.dbo.[User] 
WHERE UserName like @Username+'%';
GO

DECLARE @Username varchar = 'B'
SELECT UserName 
FROM IndexDemos.dbo.[User] 
WHERE UserName like @Username+'%';
GO

SELECT 
    cp.cacheobjtype,
    cp.objtype,
    cp.plan_handle,
    cp.size_in_bytes,
    qp.query_plan,
    st.text
FROM
    sys.dm_exec_cached_plans cp
    CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
    INNER JOIN sys.dm_exec_query_stats qs
        ON cp.plan_handle = qs.plan_handle
    CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
WHERE 
    st.text like 'DECLARE @Username varchar =%';

424 bytes each, these plan stubs are tiny!

Now if we run our second query filtering on UserName LIKE 'B%' again and then check the plan cache, we'll notice the stub is replaced with an actual compiled plan:

image-2

The downside to plan stubs is that they add some cpu load  to our server: each query gets compiled twice before it gets reused from cache.  However, since plan stubs reduce the size of our plan cache, this allows more reusable queries to be cached for longer periods of time.

Great! All my cache problems will be solved

Not necessarily.

If your workload truly involves lots of ad hoc queries (like many analysts all working on different problems or dynamic SQL that's generating completely different statements on every execution), enabling Optimize for Ad hoc Workloads may be your best option (Kimberly Tripp also has a great alternative: clearing single use plans automatically on a schedule).

However, often times single-use query plans have a more nefarious origin: unparameterized queries. In this case, enabling Optimize for Ad hoc Workloads may not negatively impact your server, but it certainly won't help. Why? Because those original queries will still be getting generated.

Brent Ozar has a good overview of why this happens, but the short answer is to force parameterization on your queries. When you enable force parameterization, SQL Server will ~~not~~ automatically parameterize your queries if they aren't already, reducing the number of one off query plans in your cache.

Whether you are dealing with too many single use queries on your server or some other problem, just remember to find the root cause of the problem instead of just treating the symptoms.

Displaying Long Values in SSMS

Watch this week's video on YouTube

I write a lot of dynamic SQL and frequently encounter variables that contain many characters:

DECLARE @LongValue NVARCHAR(MAX) = CAST('' AS NVARCHAR(MAX)) + 
N'SELECT
    ''A'' AS AShortValue,
    '''+REPLICATE(N'A',4000)+''' as ALongValue
ORDER BY 1';

This variable is 4059 characters long, and when I execute it it runs great.

SELECT LEN(@LongValue); -- 4059 characters
EXEC(@LongValue);

2018-08-01_12-17-24 A homage to one of my favorite Uncyclopedia entries.

If my programmatically built query had an error in it, the first thing I'd want to do when debugging it would be to see the the text of the entire @LongValue variable.

I could do this by just saying SELECT @LongValue, and while recent versions of SSMS will display the whole value for me, it completely loses my formatting which stinks (and is especially bad if there are any comments prefixed with --  in the query):

2018-08-01_12-25-54 Need a ultra HD wide display to fit this all on one screen.

I can say PRINT @LongValue, which will keep the formatting, but it will get trimmed at 4,000 characters (notice the missing ORDER BY):

2018-08-01_12-32-45

Some Better Ways

Erik Darling posts one solution to this problem in his T-SQL Tuesday #104 entry (as well as some other problems/solutions for lengthy SQL variables). Specifically he links to a SQL string printing script that will loop through the lengthy variable and print everything while maintaining formatting:

2018-08-01_12-32-45 Not perfectly formatted, but good enough.

And while I like using that stored procedure on my primary server, I'm too lazy to install it every where I need it.

Instead, I have a couple of go-to solutions that work on all SQL Server instances 2008 forward.

Solution 1: CAST to XML
SELECT CAST(@LongValue AS XML) AS LongValue

Casting the long variable to XML allows SSMS to generate a clickable, single-row result that preserves formatting:

2018-08-01_12-44-37 IntelliSense complains but I'm OK with it

The only downside to this approach is that certain charaters, like "<" and ">", can't be converted to XML:

2018-08-01_12-46-43

Solution 2: FOR XML PATH

A slight variation on solution 1, we can get similar results using FOR XML PATH:

SET @LongValue = '<' + @LongValue -- Let's add in an invalid character
SELECT @LongValue FOR XML PATH('')

2018-08-01_12-50-20 FOR XML PATH is one of the most abused SQL Server functions.

In this solution, the "<" is escaped to "<", which isn't perfect but at least my variable can be displayed with formatting intact.  A quick find and replace for any escaped characters and I'm good to go.

Good Enough

These techniques aren't perfect, but for purposes of debugging dynamically generated code they are good enough.

Maybe one day SSMS will print longer strings or include a syntax formatter and I won't care nearly as much.

And if not, I'll happily continue to abuse FOR XML to do things other than generate XML documents.

Building Dynamic Table-Driven Queries

MJ-t-sql-Tuesday

This post is a response to this month's T-SQL Tuesday #104 prompt by me! T-SQL Tuesday is a way for SQL Server bloggers to share ideas about different database and professional topics every month.

This month's topic is asking what code would you hate to live without?


Watch this week's video on YouTube

When given the choice between working on new projects versus maintaining old ones, I'm always more excited to work on something new.

That means that when I build something that is going to used for years to come, I try to build it so that it will require as little maintenance as possible in the future.

One technique I use for minimizing maintenance is making my queries dynamic.  Dynamic queries, while not right for every situation, do one thing really well: they allow you to modify functionality without needing a complete rewrite when your data changes.  The way I look it, it's much easier to add rules and logic to rows in table than having to modify a table's columns or structure.

To show you what I mean,let's say I want to write a query selecting data from model.sys.database_permissions:

SELECT class
      ,class_desc
      ,major_id
      ,minor_id
      ,grantee_principal_id
      ,grantor_principal_id
      ,type
      ,permission_name
      ,state
      ,state_desc
  FROM model.sys.database_permissions

Writing the query as above is pretty simple, but it isn't flexible in case the table structure changes in the future or if we want to programmatically write some conditions.

Instead of hardcoding the query as above, here is a general pattern I use for writing dynamic table-driven queries.  SQL Server has the handy views sys.all_views and sys.all_columns that show information about what columns are stored in each table/view:

2018-07-03_21-00-45

Using these two views, I can use this dynamic SQL pattern to build the same exact query as above:

-- Declare some variables up front
DECLARE 
    @FullQuery nvarchar(max),
    @Columns nvarchar(max),
    @ObjectName nvarchar(128)

-- Build our SELECT statment and schema+table name
SELECT 
    @Columns = COALESCE(@Columns + ', ', '') + '[' + c.[name] + ']',
    @ObjectName = QUOTENAME(s.name) + '.' + QUOTENAME(o.name)
FROM 
    sys.all_views o  
    INNER JOIN sys.schemas s
        ON o.schema_id = s.schema_id
    INNER JOIN sys.all_columns c
        ON o.object_id = c.object_id
WHERE 
    o.[name] = 'database_permissions'
ORDER BY
    c.column_id 

-- Put all of the pieces together an execute
SET @FullQuery = 'SELECT ' + @Columns + ' FROM ' + @ObjectName

EXEC(@FullQuery)

The way building a dynamic statement like this works is that I build my SELECT statement as a string based on the values stored in my all_columns view.  If a column is ever added to this view, my dynamic code will handle it (I wouldn't expect this view to change that much in future versions of SQL, but in other real-world tables I can regularly expect changing data).

Yes, writing certain queries dynamically like this means more up front work.  It also means some queries won't run to their full potential (not necessarily reusing plans, not tuning every individual query, needing to be thoughtful about SQL injection attacks, etc...).  There are A LOT of downsides to building queries dynamically like this.

But dynamically built queries make my systems flexible and drastically reduce the amount of work I have to do down the road.  In the next few weeks I hope to go into this type of dynamically built, table-driven process in more detail (so you should see the pattern in the example above get reused soon!).

Converting JSON to SQL Server CREATE TABLE Statements

Watch this week's video on YouTube

Tedious, repetitive tasks are the bane of any lazy programmer.  I know, because I am one.

One such repetitive task that I find comparable to counting grains of rice is building database layouts from JSON data sources.

While some online services exist that will parse JSON objects into database structures, I don't like using them because I don't trust the people running those sites with my data.  Nothing personal against them, I just don't want to be passing my data through their servers.

My solution to this problem was to write a query that will parse my unfamiliar JSON documents into a series of CREATE TABLE statements.

Automatically Generating A SQL Database Schema From JSON

You can always get the most recent version of the query from GitHub, but I'll post the current version below so that it's easier to explain in this post:

/*
This code takes a JSON input string and automatically generates
SQL Server CREATE TABLE statements to make it easier
to convert serialized data into a database schema.

It is not perfect, but should provide a decent starting point when starting
to work with new JSON files.

A blog post with more information can be found at https://bertwagner.com/2018/05/22/converting-json-to-sql-server-create-table-statements/
*/
SET NOCOUNT ON;

DECLARE 
    @JsonData nvarchar(max) = '
        {
            "Id" : 1,
            "IsActive":true,
            "Ratio": 1.25,
            "ActivityArray":[true,false,true],
            "People" : ["Jim","Joan","John","Jeff"],
            "Places" : [{"State":"Connecticut", "Capitol":"Hartford", "IsExpensive":true},{"State":"Ohio","Capitol":"Columbus","MajorCities":["Cleveland","Cincinnati"]}],
            "Thing" : { "Type":"Foo", "Value" : "Bar" },
            "Created_At":"2018-04-18T21:25:48Z"
        }',
    @RootTableName nvarchar(4000) = N'AppInstance',
    @Schema nvarchar(128) = N'dbo',
    @DefaultStringPadding smallint = 20;

DROP TABLE IF EXISTS ##parsedJson;
WITH jsonRoot AS (
    SELECT 
        0 as parentLevel, 
        CONVERT(nvarchar(4000),NULL) COLLATE Latin1_General_BIN2 as parentTableName, 
        0 AS [level], 
        [type] ,
        @RootTableName COLLATE Latin1_General_BIN2 AS TableName,
        [key] COLLATE Latin1_General_BIN2 as ColumnName,
        [value],
        ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ColumnSequence
    FROM 
        OPENJSON(@JsonData, '$')
    UNION ALL
    SELECT 
        jsonRoot.[level] as parentLevel, 
        CONVERT(nvarchar(4000),jsonRoot.TableName) COLLATE Latin1_General_BIN2, 
        jsonRoot.[level]+1, 
        d.[type],
        CASE WHEN jsonRoot.[type] IN (4,5) THEN CONVERT(nvarchar(4000),jsonRoot.ColumnName) ELSE jsonRoot.TableName END COLLATE Latin1_General_BIN2,
        CASE WHEN jsonRoot.[type] IN (4) THEN jsonRoot.ColumnName ELSE d.[key] END,
        d.[value],
        ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ColumnSequence
    FROM 
        jsonRoot
        CROSS APPLY OPENJSON(jsonRoot.[value], '$') d
    WHERE 
        jsonRoot.[type] IN (4,5) 
), IdRows AS (
    SELECT 
        -2 as parentLevel,
        null as parentTableName,
        -1 as [level],
        null as [type],
        TableName as Tablename,
        TableName+'Id' as columnName, 
        null as [value],
        0 as columnsequence
    FROM 
        (SELECT DISTINCT tablename FROM jsonRoot) j
), FKRows AS (
    SELECT 
        DISTINCT -1 as parentLevel,
        null as parentTableName,
        -1 as [level],
        null as [type],
        TableName as Tablename,
        parentTableName+'Id' as columnName, 
        null as [value],
        0 as columnsequence
    FROM 
        (SELECT DISTINCT tableName,parentTableName FROM jsonRoot) j
    WHERE 
        parentTableName is not null
)
SELECT 
    *,
    CASE [type]
        WHEN 1 THEN 
            CASE WHEN TRY_CONVERT(datetime2, [value], 127) IS NULL THEN 'nvarchar' ELSE 'datetime2' END
        WHEN 2 THEN 
            CASE WHEN TRY_CONVERT(int, [value]) IS NULL THEN 'float' ELSE 'int' END
        WHEN 3 THEN 
            'bit'
        END COLLATE Latin1_General_BIN2 AS DataType,
    CASE [type]
        WHEN 1 THEN 
            CASE WHEN TRY_CONVERT(datetime2, [value], 127) IS NULL THEN MAX(LEN([value])) OVER (PARTITION BY TableName, ColumnName) + @DefaultStringPadding ELSE NULL END
        WHEN 2 THEN 
            NULL
        WHEN 3 THEN 
            NULL
        END AS DataTypePrecision
INTO ##parsedJson
FROM jsonRoot
WHERE 
    [type] in (1,2,3)
UNION ALL SELECT IdRows.parentLevel, IdRows.parentTableName, IdRows.[level], IdRows.[type], IdRows.TableName, IdRows.ColumnName, IdRows.[value], -10 AS ColumnSequence, 'int IDENTITY(1,1) PRIMARY KEY' as datatype, null as datatypeprecision FROM IdRows 
UNION ALL SELECT FKRows.parentLevel, FKRows.parentTableName, FKRows.[level], FKRows.[type], FKRows.TableName, FKRows.ColumnName, FKRows.[value], -9 AS ColumnSequence, 'int' as datatype, null as datatypeprecision FROM FKRows 

-- For debugging:
-- SELECT * FROM ##parsedJson ORDER BY ParentLevel, level, tablename, columnsequence

DECLARE @CreateStatements nvarchar(max);

SELECT
    @CreateStatements = COALESCE(@CreateStatements + CHAR(13) + CHAR(13), '') + 
    'CREATE TABLE ' + @Schema + '.' + TableName + CHAR(13) + '(' + CHAR(13) +
        STRING_AGG( ColumnName + ' ' + DataType + ISNULL('('+CAST(DataTypePrecision AS nvarchar(20))+')','') +  CASE WHEN DataType like '%PRIMARY KEY%' THEN '' ELSE ' NULL' END, ','+CHAR(13)) WITHIN GROUP (ORDER BY ColumnSequence) 
    + CHAR(13)+')'
FROM
    (SELECT DISTINCT 
        j.TableName, 
        j.ColumnName,
        MAX(j.ColumnSequence) AS ColumnSequence, 
        j.DataType, 
        j.DataTypePrecision, 
        j.[level] 
    FROM 
        ##parsedJson j
        CROSS APPLY (SELECT TOP 1 ParentTableName + 'Id' AS ColumnName FROM ##parsedJson p WHERE j.TableName = p.TableName ) p
    GROUP BY
        j.TableName, j.ColumnName,p.ColumnName, j.DataType, j.DataTypePrecision, j.[level] 
    ) j
GROUP BY
    TableName


PRINT @CreateStatements;

In the variables section, we can define our input JSON document string as well as define things like a root table name and default database schema name.

There is also a string padding variable.  This padding variable's value is added to the max value length found in each column being generated, giving each column a little bit more breathing room.

Next in the script is the recursive CTE that parses the JSON string.  The OPENJSON() function in SQL Server makes this part relatively easy since some of the work of determining datatypes is already done for you.

I've taken the liberty to convert all strings to nvarchar types, numbers to either floats or ints, booleans to bits, and datetime strings to datetime2s.

Two additional CTE expressions add an integer IDENTITY PRIMARY KEY column to each table as well as a column referencing the parent table if applicable (our foreign key column).

Finally, a little bit of dynamic SQL pieces together all of these components to generate our CREATE TABLE scripts.

Limitations

I created this code with a lot of assumptions about my (unfamiliar) JSON data sets.  For the purpose of roughly building out tables from large JSON files, I don't need the results to be perfect and production-ready; I just want the results to be mostly correct so the vast majority of tedious table creation work is automated.

With that disclaimer made, here are a few things to be aware of:

  • Sometimes there will be duplicate column names generated because of naming - just delete one.
  • While foreign key columns exist, the foreign key constraints don't.
  • This code uses STRING_AGG.  I'll leave it up to you to convert to STUFF and FOR XML PATH if you need to run it in versions prior to 2017.

Summary

This script is far from perfect.  But it has eliminated the need for me to build out these tables and columns from scratch.  Sure, the output sometimes needs a tweak or too, but for my purposes I'm happy with how it turned out.  I hope it helps you eliminate some boring table creation work too.

ʼ;ŚℇℒℇℂƮ *: How Unicode Homoglyphs Can Thwart Your Database Security

niketh-vellanki-202943

For the past couple weeks I've been writing about how to protect your database from a SQL injection attack.  Today, we will keep the trend going by looking at how implicit unicode conversions can leave your data vulnerable.

Watch this week's video on YouTube

What's a homoglyph?

A homoglyph is a character that looks like another character. l  (lowercase "L") and 1  (the number) are considered homoglyphs.  So are O  (the letter) and 0  (the number).

Homoglpyhs can exist within a character set (like the Latin character set examples above) or they can exist between character sets.  For example, you may have the unicode apostrophe ʼ, which is a homoglyph to the Latin single quote character ' .

How does SQL Server handle unicode homoglyphs?

Funny you should ask.  If you pass in a unicode character to a non-unicode datatype (like char), SQL implicitly converts the unicode character to its closest resembling non-unicode homoglyph.

To see this in action, we can use the unicode apostrophe from the example above:

SELECT 
    CAST(N'ʼ' AS nchar) AS UnicodeChar, 
    CAST(N'ʼ' AS char) AS NonUnicodeChar

You can see in the second column SQL automatically converted the apostrophe to a single quote:

2017-09-07_20-23-49

Although this implicit character conversion can be convenient for when you want to display unicode characters in a non-unicode character set, it can spell disaster for your SQL Server security.

Unicode Homoglyph SQL Injection

If you are already using sp_executesql or QUOTENAME() when building your dynamic SQL queries then you are safe from this type of SQL injection.

I know you aren't the kind of person who would ever write your own security functions when solid, safe, and tested functions like the above are available.  However, just this one time let's pretend you think you can outsmart a hacker by writing your own quote delimiting code.

Using the same dataset as last week, let's create a new stored procedure that is going to return some data from a user's profile:

DROP PROCEDURE IF EXISTS dbo.GetProfile
GO
CREATE PROCEDURE dbo.GetProfile
    @Username nvarchar(100)
AS
BEGIN
    -- Add quotes to escape injection...or not?
    SET @Username = REPLACE(@Username, '''','''''')

    DECLARE @Query varchar(max)

    SET @Query = 'SELECT 
                    FullName, 
                    JoinDate
                FROM
                    dbo.RegisteredUser
                WHERE
                    UserName = ''' + @Username + '''
                    '

    EXEC(@Query)
END
GO

Instead of using sp_executesql or QUOTENAME(), let's try to write our own clever REPLACE() function that will replace single quotes with two sets of single quotes.  This should, in theory, prevent SQL injection.

If we test out a "normal" attempt at SQL injection, you'll notice this logic works great.  Give yourself a pat on the back!

2017-09-07_20-42-32

However, if we pass in a unicode apostrophe...:

2017-09-07_20-45-23

The reason this happens is because we declared our @Query parameter as varchar instead of the unicode nvarchar.  When we build our dynamic SQL statement, SQL implicitly converts the nvarchar @Username parameter to the non-unicode varchar:

2017-09-07_20-51-19

So if I replace apostrophes will that make me safe?

No.

I know it seems like black listing/replacing the unicode apostrophe would solve all of our problems.

And it would...in this scenario only.  There are more unicode homoglpyhs than just an apostrophe though.

Out of curiosity I wrote a script to search through the unicode character space to see what other homoglyphs exist:

DECLARE @FirstNumber INT=0;
-- number of possible characters in the unicode space
DECLARE @LastNumber INT=1114112;

WITH Numbers AS (
    SELECT @FirstNumber AS n
    UNION ALL
    SELECT n+1 FROM Numbers WHERE n+1<=@LastNumber
), UnicodeConversion AS (
SELECT
       n AS CharacterNumber,
       CASE CAST(NCHAR(n) as CHAR(1))
              WHEN '''' THEN NCHAR(n)
              WHEN ';' THEN NCHAR(n)
       END AS UnicodeCharacter,
       CAST(NCHAR(n) as CHAR(1)) AS ASCIICharacter
FROM Numbers
)
SELECT
       *
FROM
       UnicodeConversion
WHERE
       UnicodeCharacter IS NOT NULL
OPTION (MAXRECURSION 0)

2017-09-07_20-58-34

Although the characters in the above screen shot might look similar, they are actually homoglyphs.

I decided to only search for single quotes and semi-colons since they are frequently used in SQL injection attacks, but this by no means is an extensive list of all of the characters you would want to blacklist.

Not only would it be very difficult to confidently blacklist every dangerous homoglyph, but new characters are being added to unicode all of the time so maintaining a black list would be a maintenance nightmare.  Especially if the person maintaining this code in the future isn't familiar with these types of injection attacks.

And don't be cheeky thinking you can filter out dangerous SQL keywords either - even if you REPLACE(@Username,'SELECT',''), just remember someone can come by and pass in a value like 'ŚεℒℇℂƮ'.

Conclusion

Don't write your own security functions - they will fail.

Your best protection against SQL injection is to not use dynamic SQL.  If you have to use dynamic SQL, then use either sp_executesql and QUOTENAME().