Data with Bert - dynamic queries

Today I want to share with you an interesting observation I made about SELECT expression execution order.

I was working on writing a dynamic SQL query that would transform the following piece of JSON:

{"data":[["a","b","c"],["d","e","f"]]}

Into a query that looked like this:

SELECT 'a' AS Row0Column0, 'b' AS Row0Column1, 'c' AS Row0Column2 
UNION ALL
SELECT 'd' AS Row1Column0, 'e' AS Row1Column1, 'f' AS Row1Column2

Normally I would use something like OPENJSON and PIVOT to transform the original data into a table result set, but in this instance I my requirements dictated that I needed to build the code as a series of SELECT and UNION ALL statements.

The first step in building this query was using OPENJSON to parse the JSON data into rows and value arrays:

SELECT
    *
FROM
(
SELECT
    rows.[key] AS RowNumber,
    rows.[Value] AS RowArray
FROM 
    OPENJSON(N'{"data":[["a","b","c"],["d","e","f"]]}','$.data') rows
) r
CROSS APPLY OPENJSON(r.RowArray) v
ORDER BY
    r.RowNumber,
    v.[key]

This first query was a good start. I then added a variable @RowQuery and started building my dynamic SQL code to generate my SELECT and UNION ALL statements:

DECLARE 
    @RowQuery varchar(max)

/* TOP is here to get the ORDER BY to work as expected */
SELECT TOP 134960239460263
    @RowQuery =  COALESCE (@RowQuery + ' ','') + '''' + v.[value] + ''' as [Row'+r.RowNumber+'Column'+v.[key]+']'
FROM
(
SELECT
    rows.[key] AS RowNumber,
    rows.[Value] AS RowArray
FROM 
    OPENJSON(N'{"data":[["a","b","c"],["d","e","f"]]}','$.data') rows
) r
CROSS APPLY OPENJSON(r.RowArray) v
ORDER BY
    r.RowNumber,
    v.[key]

PRINT 'SELECT ' + @RowQuery;

At this point I had the row/column numbering correct, but I still needed to add a UNION ALL SELECT before the start of each row.

I thought, "Oh, this is easy. Since the dynamic SQL I'm building is basically a loop, I need to check for a change in the RowNumber column's value to identify I'm on a new row. If I am, I can insert the UNION ALL SELECT text and I'll be all set":

DECLARE 
    @RowQuery varchar(max),
    @CurrentRow int = 0;

/* TOP is here to get the ORDER BY to work as expected */
SELECT TOP 134960239460263
    @RowQuery =  COALESCE (@RowQuery + '','') + IIF(r.RowNumber > @CurrentRow, CHAR(10)+'UNION ALL'+CHAR(10)+'SELECT ', ', ')+'''' + v.[value] + ''' as [Row'+r.RowNumber+'Column'+v.[key]+']',
    @CurrentRow = IIF(r.RowNumber > @CurrentRow, r.RowNumber, @CurrentRow)
FROM
(
SELECT
    rows.[key] AS RowNumber,
    rows.[Value] AS RowArray
FROM 
    OPENJSON(N'{"data":[["a","b","c"],["d","e","f"]]}','$.data') rows
) r
CROSS APPLY OPENJSON(r.RowArray) v
ORDER BY
    r.RowNumber,
    v.[key]

/*remove the first comma and add an initial SELECT */
PRINT STUFF(@RowQuery,1,1,'SELECT');

Success! But as I was celebrating my dynamic SQL victory, I realized I was making an assumption about SQL Server that I had never thought about before:

The above query only works because SQL Server is executing the variables in the SELECT list sequentially. I'm incrementing @CurrentRow only after processing my @RowQuery variable, and this logic only works correctly if SQL Server executes the variable expressions in the order they appear in the SELECT list. If SQL Server was executing items in the SELECT list in reverse or random order, @CurrentRow could potentially get set BEFORE @RowQuery was evaluated, causing the logic of adding "UNION ALL SELECT" in the right location to fail.

This surprised me because I don't usually think about the column execution order of a query. Normally column expressions in the SELECT statement are independent of each other so the order that the columns are executed in doesn't really matter. But in this example, the column execution order does matter and it's reassuring to see SQL Server do what I assumed it was doing.

Now, I can't guarantee this always works. I tried but failed to think of a scenario where SQL Server wouldn't execute the columns in sequential order. While the query seemed to work as expected in all of the tests I ran, I'll leave this observation open ended in case anyone has ever encountered a scenario or has any ideas of when SQL Server doesn't process SELECT statement expressions in the order they are listed.

This post is a response to this month's T-SQL Tuesday #104 prompt by me! T-SQL Tuesday is a way for SQL Server bloggers to share ideas about different database and professional topics every month.

This month's topic is asking what code would you hate to live without?

Watch this week's video on YouTube

When given the choice between working on new projects versus maintaining old ones, I'm always more excited to work on something new.

That means that when I build something that is going to used for years to come, I try to build it so that it will require as little maintenance as possible in the future.

One technique I use for minimizing maintenance is making my queries dynamic. Dynamic queries, while not right for every situation, do one thing really well: they allow you to modify functionality without needing a complete rewrite when your data changes. The way I look it, it's much easier to add rules and logic to rows in table than having to modify a table's columns or structure.

To show you what I mean,let's say I want to write a query selecting data from model.sys.database_permissions:

SELECT class
      ,class_desc
      ,major_id
      ,minor_id
      ,grantee_principal_id
      ,grantor_principal_id
      ,type
      ,permission_name
      ,state
      ,state_desc
  FROM model.sys.database_permissions

Writing the query as above is pretty simple, but it isn't flexible in case the table structure changes in the future or if we want to programmatically write some conditions.

Instead of hardcoding the query as above, here is a general pattern I use for writing dynamic table-driven queries. SQL Server has the handy views sys.all_views and sys.all_columns that show information about what columns are stored in each table/view:

2018-07-03_21-00-45

Using these two views, I can use this dynamic SQL pattern to build the same exact query as above:

-- Declare some variables up front
DECLARE 
    @FullQuery nvarchar(max),
    @Columns nvarchar(max),
    @ObjectName nvarchar(128)

-- Build our SELECT statment and schema+table name
SELECT 
    @Columns = COALESCE(@Columns + ', ', '') + '[' + c.[name] + ']',
    @ObjectName = QUOTENAME(s.name) + '.' + QUOTENAME(o.name)
FROM 
    sys.all_views o  
    INNER JOIN sys.schemas s
        ON o.schema_id = s.schema_id
    INNER JOIN sys.all_columns c
        ON o.object_id = c.object_id
WHERE 
    o.[name] = 'database_permissions'
ORDER BY
    c.column_id 

-- Put all of the pieces together an execute
SET @FullQuery = 'SELECT ' + @Columns + ' FROM ' + @ObjectName

EXEC(@FullQuery)

The way building a dynamic statement like this works is that I build my SELECT statement as a string based on the values stored in my all_columns view. If a column is ever added to this view, my dynamic code will handle it (I wouldn't expect this view to change that much in future versions of SQL, but in other real-world tables I can regularly expect changing data).

Yes, writing certain queries dynamically like this means more up front work. It also means some queries won't run to their full potential (not necessarily reusing plans, not tuning every individual query, needing to be thoughtful about SQL injection attacks, etc...). There are A LOT of downsides to building queries dynamically like this.

But dynamically built queries make my systems flexible and drastically reduce the amount of work I have to do down the road. In the next few weeks I hope to go into this type of dynamically built, table-driven process in more detail (so you should see the pattern in the example above get reused soon!).

SELECT Expression Execution Order

Building Dynamic Table-Driven Queries