Data with Bert logo

Building Dynamic Table-Driven Queries

MJ-t-sql-Tuesday

This post is a response to this month's T-SQL Tuesday #104 prompt by me! T-SQL Tuesday is a way for SQL Server bloggers to share ideas about different database and professional topics every month.

This month's topic is asking what code would you hate to live without?


Watch this week's video on YouTube

When given the choice between working on new projects versus maintaining old ones, I'm always more excited to work on something new.

That means that when I build something that is going to used for years to come, I try to build it so that it will require as little maintenance as possible in the future.

One technique I use for minimizing maintenance is making my queries dynamic.  Dynamic queries, while not right for every situation, do one thing really well: they allow you to modify functionality without needing a complete rewrite when your data changes.  The way I look it, it's much easier to add rules and logic to rows in table than having to modify a table's columns or structure.

To show you what I mean,let's say I want to write a query selecting data from model.sys.database_permissions:

SELECT class
      ,class_desc
      ,major_id
      ,minor_id
      ,grantee_principal_id
      ,grantor_principal_id
      ,type
      ,permission_name
      ,state
      ,state_desc
  FROM model.sys.database_permissions

Writing the query as above is pretty simple, but it isn't flexible in case the table structure changes in the future or if we want to programmatically write some conditions.

Instead of hardcoding the query as above, here is a general pattern I use for writing dynamic table-driven queries.  SQL Server has the handy views sys.all_views and sys.all_columns that show information about what columns are stored in each table/view:

2018-07-03_21-00-45

Using these two views, I can use this dynamic SQL pattern to build the same exact query as above:

-- Declare some variables up front
DECLARE 
    @FullQuery nvarchar(max),
    @Columns nvarchar(max),
    @ObjectName nvarchar(128)

-- Build our SELECT statment and schema+table name
SELECT 
    @Columns = COALESCE(@Columns + ', ', '') + '[' + c.[name] + ']',
    @ObjectName = QUOTENAME(s.name) + '.' + QUOTENAME(o.name)
FROM 
    sys.all_views o  
    INNER JOIN sys.schemas s
        ON o.schema_id = s.schema_id
    INNER JOIN sys.all_columns c
        ON o.object_id = c.object_id
WHERE 
    o.[name] = 'database_permissions'
ORDER BY
    c.column_id 

-- Put all of the pieces together an execute
SET @FullQuery = 'SELECT ' + @Columns + ' FROM ' + @ObjectName

EXEC(@FullQuery)

The way building a dynamic statement like this works is that I build my SELECT statement as a string based on the values stored in my all_columns view.  If a column is ever added to this view, my dynamic code will handle it (I wouldn't expect this view to change that much in future versions of SQL, but in other real-world tables I can regularly expect changing data).

Yes, writing certain queries dynamically like this means more up front work.  It also means some queries won't run to their full potential (not necessarily reusing plans, not tuning every individual query, needing to be thoughtful about SQL injection attacks, etc...).  There are A LOT of downsides to building queries dynamically like this.

But dynamically built queries make my systems flexible and drastically reduce the amount of work I have to do down the road.  In the next few weeks I hope to go into this type of dynamically built, table-driven process in more detail (so you should see the pattern in the example above get reused soon!).

Code You Would Hate To Live Without (T-SQL Tuesday #104 Invitation)

MJ-t-sql-Tuesday

The recent news about Microsoft acquiring GitHub has me thinking about how amazing it is for us to be part of today's online code community.

Before modern online programming communities, finding good code samples or sharing your own code was challenging.  Forums and email lists (if searchable) were good, but beyond that you had to rely on books, coworkers, and maybe a local meetup of like-minded individuals to help you work through your programming problems.

Watch this week's video on YouTube

Today, accessing and using code from the internet is second nature - I almost always first look online to see if a good solution already exists.  At the very least, searching blogs, GitHub, and StackOverflow for existing code is a great way to generate ideas.

For this month's T-SQL Tuesday, I want you to write about code you've written that you would hate to live without.

Maybe you built a maintenance script to free up disk space, wrote a query to gather system stats for monitoring, or coded some PowerShell to clean up string data.  Your work doesn't need to be completely original either - maybe you've improved the code in some open source project to better solve the problem for your particular situation.

There's probably someone out there in the world who is experiencing the same problem that you have already solved; let's make their life a little easier by sharing.

And don't worry if your code isn't perfect - just explain how your solution works and if you are aware of any caveats.  If it's not an exact solution for someone else's problem, at the very least it may help them generate some ideas.

Finally, here's a reminder of the official rules for T-SQL Tuesday:

  1. Publish your contribution on Tuesday, July 10, 2018. Let's use the "it's Tuesday somewhere" rule.
  2. Include the T-SQL Tuesday Logo and have it link to this post.
  3. Please comment below with a link to your post (trackbacks/pingbacks should work too but...comments ensure I don't miss your post)
  4. Tweet about your post using #tsql2sday.
  5. If you'd like to host in the future, contact Adam Machanic.

Join Elimination: When SQL Server Removes Unnecessary Tables

In this post and video at SQLPerformance.com, I discuss what join elimination is and how it works in SQL Server.  Join elimination is one of the many techniques that the SQL Server query optimizer uses to create efficient query plans. Specifically, join elimination occurs when SQL Server can establish equality by using query logic or trusted database constraints to eliminate unnecessary joins... Read more or watch the video

4 Ways To Define Lookup Values In A Query

Watch this week's video on YouTube

Occasionally you may need to create a lookup table of values for a single query.  Building a permanent table of these values is a good option if you need to reuse the values frequently, but for one-off, ad-hoc queries you may want to hard-code the lookup table of values directly in your query.

We can hard-code these values using several different techniques.  Below are the techniques I've seen used before, going in order from my least to most favorite.

Table Variables

DECLARE @T TABLE 
(
    ColorName varchar(20), 
    HexCode char(7)
);

INSERT INTO @T VALUES 
        ('FireBrick','#B22222'),
        ('HotPink','#FF69B4'),
        ('Tomato','#FF6347'),
        ('PapayaWhip','#FFEFD5'),
        ('RebeccaPurple','#663399'),
        ('LawnGreen','#7CFC00'),
        ('MidnightBlue','#191970');

SELECT * FROM @T;

Table variables get a bad rap in SQL Server because they don't provide accurate row estimates to the query optimizer.  This can cause SQL Server to come up with some really terrible execution plans that will kill your query's performance.

However, if your use case is a single small lookup table of less than 100 records, table variables might actually be a viable option.  They are quick and easy to setup and can be added to the top of your query.

With that said, I don't think I've ever used a table variable in this type of scenario (or any scenario really).  I know some people love using them and I think that's fine as long as you are keeping track of your query performance.  For me though, there are so many better options available...

Temporary Tables

CREATE TABLE #T 
(
    ColorName varchar(20), 
    HexCode char(7)
);

INSERT INTO #T VALUES 
        ('FireBrick','#B22222'),
        ('HotPink','#FF69B4'),
        ('Tomato','#FF6347'),
        ('PapayaWhip','#FFEFD5'),
        ('RebeccaPurple','#663399'),
        ('LawnGreen','#7CFC00'),
        ('MidnightBlue','#191970');

SELECT * FROM #T;

Temp tables are the answer to many of the table variable's shortcomings.

Temp tables can perform well with larger amounts of data because they can be indexed and can have statistics generated on them.  Both of these features typically help SQL Server generate better execution plans.

There is some overhead in coding a temp table though: just like a table variable, a temp table needs to be created and inserted into before being able to use it in your query.  While normally not a huge deal, this is not something I want to have to do in those instances where I want to define some lookup values quickly...

SELECT with UNION ALL

SELECT
    *
FROM
    (
        SELECT 'FireBrick','#B22222'     UNION ALL
        SELECT 'HotPink','#FF69B4'   UNION ALL
        SELECT 'Tomato','#FF6347'    UNION ALL
        SELECT 'PapayaWhip','#FFEFD5'    UNION ALL
        SELECT 'RebeccaPurple','#663399' UNION ALL
        SELECT 'LawnGreen','#7CFC00'     UNION ALL
        SELECT 'MidnightBlue','#191970'
    ) T(ColorName,HexCode);

The next option is hard-coding values in SELECT statements and then joining them together with UNION ALLs.

This is probably the most common technique I see, and for good reason: the syntax is straight forward and doesn't require any special setup; perfect for the one-time use ad-hoc scenario.

Its format also makes it easy to use the ALT + highlight shortcut to quickly create a derived table lookup from the results of another query or values copied from elsewhere.

I do like this method a lot, but there is one method that I like slightly more...

The VALUES() Constructor

SELECT
    *
FROM
    (VALUES 
        ('FireBrick','#B22222'),
        ('HotPink','#FF69B4'),
        ('Tomato','#FF6347'),
        ('PapayaWhip','#FFEFD5'),
        ('RebeccaPurple','#663399'),
        ('LawnGreen','#7CFC00'),
        ('MidnightBlue','#191970')
        ) T(ColorName,HexCode);

You've probably use the VALUES constructor in an INSERT INTO statement, but did you know you can also use it in a FROM clause?

This syntax is similar to our SELECT + UNION ALL technique above, except we are swapping in single quotes and parentheses for SELECTs and UNION ALLs.  I find this slightly easier to write, if only because it requires typing out fewer characters and feels more programmatic.

One Last Tip: CTE

WITH HtmlColors AS (
        SELECT 'FireBrick' AS ColorName,'#B22222' AS HexCode     UNION ALL
        SELECT 'HotPink','#FF69B4'   UNION ALL
        SELECT 'Tomato','#FF6347'    UNION ALL
        SELECT 'PapayaWhip','#FFEFD5'    UNION ALL
        SELECT 'RebeccaPurple','#663399' UNION ALL
        SELECT 'LawnGreen','#7CFC00'     UNION ALL
        SELECT 'MidnightBlue','#191970'
)

SELECT * FROM HtmlColors

This isn't really an additional technique, but something related that I use often and feels appropriate to mention.

If using either the SELECT + UNION ALL or VALUES techniques, you can put those lookup queries into a common table expression for easier referencing.

This doesn't give any performance advantage, but it does help keep your code clean by putting your lookup logic right at the top of your file.  This becomes particularly useful when using those hard-coded lookup values as parameters, allowing all changes to your query to be made right at the top of your file during subsequent runs.

Converting JSON to SQL Server CREATE TABLE Statements

Watch this week's video on YouTube

Tedious, repetitive tasks are the bane of any lazy programmer.  I know, because I am one.

One such repetitive task that I find comparable to counting grains of rice is building database layouts from JSON data sources.

While some online services exist that will parse JSON objects into database structures, I don't like using them because I don't trust the people running those sites with my data.  Nothing personal against them, I just don't want to be passing my data through their servers.

My solution to this problem was to write a query that will parse my unfamiliar JSON documents into a series of CREATE TABLE statements.

Automatically Generating A SQL Database Schema From JSON

You can always get the most recent version of the query from GitHub, but I'll post the current version below so that it's easier to explain in this post:

/*
This code takes a JSON input string and automatically generates
SQL Server CREATE TABLE statements to make it easier
to convert serialized data into a database schema.

It is not perfect, but should provide a decent starting point when starting
to work with new JSON files.

A blog post with more information can be found at https://bertwagner.com/2018/05/22/converting-json-to-sql-server-create-table-statements/
*/
SET NOCOUNT ON;

DECLARE 
    @JsonData nvarchar(max) = '
        {
            "Id" : 1,
            "IsActive":true,
            "Ratio": 1.25,
            "ActivityArray":[true,false,true],
            "People" : ["Jim","Joan","John","Jeff"],
            "Places" : [{"State":"Connecticut", "Capitol":"Hartford", "IsExpensive":true},{"State":"Ohio","Capitol":"Columbus","MajorCities":["Cleveland","Cincinnati"]}],
            "Thing" : { "Type":"Foo", "Value" : "Bar" },
            "Created_At":"2018-04-18T21:25:48Z"
        }',
    @RootTableName nvarchar(4000) = N'AppInstance',
    @Schema nvarchar(128) = N'dbo',
    @DefaultStringPadding smallint = 20;

DROP TABLE IF EXISTS ##parsedJson;
WITH jsonRoot AS (
    SELECT 
        0 as parentLevel, 
        CONVERT(nvarchar(4000),NULL) COLLATE Latin1_General_BIN2 as parentTableName, 
        0 AS [level], 
        [type] ,
        @RootTableName COLLATE Latin1_General_BIN2 AS TableName,
        [key] COLLATE Latin1_General_BIN2 as ColumnName,
        [value],
        ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ColumnSequence
    FROM 
        OPENJSON(@JsonData, '$')
    UNION ALL
    SELECT 
        jsonRoot.[level] as parentLevel, 
        CONVERT(nvarchar(4000),jsonRoot.TableName) COLLATE Latin1_General_BIN2, 
        jsonRoot.[level]+1, 
        d.[type],
        CASE WHEN jsonRoot.[type] IN (4,5) THEN CONVERT(nvarchar(4000),jsonRoot.ColumnName) ELSE jsonRoot.TableName END COLLATE Latin1_General_BIN2,
        CASE WHEN jsonRoot.[type] IN (4) THEN jsonRoot.ColumnName ELSE d.[key] END,
        d.[value],
        ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ColumnSequence
    FROM 
        jsonRoot
        CROSS APPLY OPENJSON(jsonRoot.[value], '$') d
    WHERE 
        jsonRoot.[type] IN (4,5) 
), IdRows AS (
    SELECT 
        -2 as parentLevel,
        null as parentTableName,
        -1 as [level],
        null as [type],
        TableName as Tablename,
        TableName+'Id' as columnName, 
        null as [value],
        0 as columnsequence
    FROM 
        (SELECT DISTINCT tablename FROM jsonRoot) j
), FKRows AS (
    SELECT 
        DISTINCT -1 as parentLevel,
        null as parentTableName,
        -1 as [level],
        null as [type],
        TableName as Tablename,
        parentTableName+'Id' as columnName, 
        null as [value],
        0 as columnsequence
    FROM 
        (SELECT DISTINCT tableName,parentTableName FROM jsonRoot) j
    WHERE 
        parentTableName is not null
)
SELECT 
    *,
    CASE [type]
        WHEN 1 THEN 
            CASE WHEN TRY_CONVERT(datetime2, [value], 127) IS NULL THEN 'nvarchar' ELSE 'datetime2' END
        WHEN 2 THEN 
            CASE WHEN TRY_CONVERT(int, [value]) IS NULL THEN 'float' ELSE 'int' END
        WHEN 3 THEN 
            'bit'
        END COLLATE Latin1_General_BIN2 AS DataType,
    CASE [type]
        WHEN 1 THEN 
            CASE WHEN TRY_CONVERT(datetime2, [value], 127) IS NULL THEN MAX(LEN([value])) OVER (PARTITION BY TableName, ColumnName) + @DefaultStringPadding ELSE NULL END
        WHEN 2 THEN 
            NULL
        WHEN 3 THEN 
            NULL
        END AS DataTypePrecision
INTO ##parsedJson
FROM jsonRoot
WHERE 
    [type] in (1,2,3)
UNION ALL SELECT IdRows.parentLevel, IdRows.parentTableName, IdRows.[level], IdRows.[type], IdRows.TableName, IdRows.ColumnName, IdRows.[value], -10 AS ColumnSequence, 'int IDENTITY(1,1) PRIMARY KEY' as datatype, null as datatypeprecision FROM IdRows 
UNION ALL SELECT FKRows.parentLevel, FKRows.parentTableName, FKRows.[level], FKRows.[type], FKRows.TableName, FKRows.ColumnName, FKRows.[value], -9 AS ColumnSequence, 'int' as datatype, null as datatypeprecision FROM FKRows 

-- For debugging:
-- SELECT * FROM ##parsedJson ORDER BY ParentLevel, level, tablename, columnsequence

DECLARE @CreateStatements nvarchar(max);

SELECT
    @CreateStatements = COALESCE(@CreateStatements + CHAR(13) + CHAR(13), '') + 
    'CREATE TABLE ' + @Schema + '.' + TableName + CHAR(13) + '(' + CHAR(13) +
        STRING_AGG( ColumnName + ' ' + DataType + ISNULL('('+CAST(DataTypePrecision AS nvarchar(20))+')','') +  CASE WHEN DataType like '%PRIMARY KEY%' THEN '' ELSE ' NULL' END, ','+CHAR(13)) WITHIN GROUP (ORDER BY ColumnSequence) 
    + CHAR(13)+')'
FROM
    (SELECT DISTINCT 
        j.TableName, 
        j.ColumnName,
        MAX(j.ColumnSequence) AS ColumnSequence, 
        j.DataType, 
        j.DataTypePrecision, 
        j.[level] 
    FROM 
        ##parsedJson j
        CROSS APPLY (SELECT TOP 1 ParentTableName + 'Id' AS ColumnName FROM ##parsedJson p WHERE j.TableName = p.TableName ) p
    GROUP BY
        j.TableName, j.ColumnName,p.ColumnName, j.DataType, j.DataTypePrecision, j.[level] 
    ) j
GROUP BY
    TableName


PRINT @CreateStatements;

In the variables section, we can define our input JSON document string as well as define things like a root table name and default database schema name.

There is also a string padding variable.  This padding variable's value is added to the max value length found in each column being generated, giving each column a little bit more breathing room.

Next in the script is the recursive CTE that parses the JSON string.  The OPENJSON() function in SQL Server makes this part relatively easy since some of the work of determining datatypes is already done for you.

I've taken the liberty to convert all strings to nvarchar types, numbers to either floats or ints, booleans to bits, and datetime strings to datetime2s.

Two additional CTE expressions add an integer IDENTITY PRIMARY KEY column to each table as well as a column referencing the parent table if applicable (our foreign key column).

Finally, a little bit of dynamic SQL pieces together all of these components to generate our CREATE TABLE scripts.

Limitations

I created this code with a lot of assumptions about my (unfamiliar) JSON data sets.  For the purpose of roughly building out tables from large JSON files, I don't need the results to be perfect and production-ready; I just want the results to be mostly correct so the vast majority of tedious table creation work is automated.

With that disclaimer made, here are a few things to be aware of:

  • Sometimes there will be duplicate column names generated because of naming - just delete one.
  • While foreign key columns exist, the foreign key constraints don't.
  • This code uses STRING_AGG.  I'll leave it up to you to convert to STUFF and FOR XML PATH if you need to run it in versions prior to 2017.

Summary

This script is far from perfect.  But it has eliminated the need for me to build out these tables and columns from scratch.  Sure, the output sometimes needs a tweak or too, but for my purposes I'm happy with how it turned out.  I hope it helps you eliminate some boring table creation work too.