6 Techniques For Troubleshooting Your Code

MJ-t-sql-TuesdayThis post is a response to this month's T-SQL Tuesday #105 prompt by Wayne Sheffield.  T-SQL Tuesday is a way for the SQL Server community to share ideas about different database and professional topics every month.

This month's topic asks to share a time you ran into a metaphorical brick wall and how you worked it out.


Watch this week's video on YouTube

The Problem: Trimmed JSON Values

Recently I was using FOR JSON PATH to generate a JSON string from a query to use in a web app:

SELECT TOP 100 *
FROM master..spt_values
WHERE type='P'
FOR JSON PATH

The resulting JSON string is 5,580 characters long.

The goal was to read this query result into my .NET app like so:

var jsonResult = db.Database.SqlQuery<string>("SELECT TOP 100 * FROM ... FOR JSON PATH");

Is this the best way to design every app/database interaction?  Probably not, but it was right for this scenario because I didn't want to create models for all of the dynamic data I'd be returning.

Design decisions aside, my "brick wall" issue was that my "jsonResult" value was getting truncated around 2,000 characters instead of displaying the full 5,580.  The JSON string looked great in SSMS, but for the life of me I couldn't figure out why the data was getting chopped off when read into .NET.

Time to Debug

What follows are the usual steps I take when debugging a problem that has me stumped.  I've turned these into a 1950s style educational film so you can laugh at my bad jokes while your learn.

1. Rubber Duck Debugging

The first thing I usually do when I hit a wall like this is talk myself through the problem again.

This technique usually works well for me and is equivalent to those times when you ask  someone for help but realize the solution while explaining the problem to them.

To save yourself embarrassment (and to let your coworkers keep working uninterrupted), people often substitute an inanimate object, like a rubber duck, instead of a coworker to try and work out the problem on their own.

Alas, in this case explaining the problem to myself didn't help, so I moved on to the next technique.

2. Simplify the Problem

Breaking a problem down into smaller solvable parts can help sometimes.  I changed my query to return less data by switching to SELECT TOP 5 and seeing if .NET was still truncating the data.  It wasn't! Mildly successful!

In this case though, I couldn't really build off my simplified example.  As soon as my result passed ~2,000 characters, the JSON string was getting chopped off.

In this step I also figured out if I put my query into a derived table, my .NET code worked beautifully and returned the complete JSON string:

SELECT * FROM
(
SELECT TOP 100 *
FROM master..spt_values
WHERE type='P'
FOR JSON PATH
) t(c)

This was an ugly solution though and I didn't like doing it.  I especially didn't like it because I didn't know why a derived table fixed the output.

3. Check the Internet

As great as the internet is, I try to limit myself to how much searching I do on it when troubleshooting.  Searching for an online solution can quickly devolve into wasting two hours with nothing to show.

I performed some cursory searches on Google, StackOverflow, various forums, blogs, etc... but didn't find anything helpful (fun/sad fact: I searched for solutions again while typing up this post and now find plenty of solutions...who knows what I was doing wrong that day).

4. Ask a Friend

I love working through problems with others because I'm often amazed at how differently others approach a problem.  Often times this leads to a solution I would not have thought of on my own.  I especially enjoy hearing from people new to the subject area because they often have the most creative solutions due to not yet having become cynical and jaded from previous experience :).

I try to hold off on this option until at least trying all of the above techniques because 1) I hate breaking another person's concentration 2) I feel like I learn better if I struggle through a problem myself.

And in this case shopping the problem around didn't help - no one I talked to had a great solution.

5. Take a Break

After trying all of the above, I was out of ideas.  I took a break from the problem for the rest of the day, resolved to give it another try in the morning the morning.

6. RTFM

And the next morning, I had the idea to check the documentation to see what it said about the return type of FOR JSON PATH.

Embarrassingly, this one should be way higher on the list, and I'd like to say that it usually is, but for one reason or another I didn't bother checking until this late in the game.

And wouldn't you know it?  The last paragraph, of the last section, tells me exactly what I needed to know.

The documentation tells me that the JSON string will be broken up across multiple rows and my client app needs to concatenate them all together.  What I ended up doing is a simple String.Join():

var jsonResult = String.Join("",db.Database.SqlQuery<string>("SELECT TOP 100 * FROM ... FOR JSON PATH"));

There's no explanation for why SSMS is able to concatenate these values together but other client apps have to manually do so, but at least I found my documented solution.

Conclusion

Even though I found a somewhat-satisfactory solution in the documentation, my fall back was going to be to use the ugly derived table solution discovered in step 2.  It was ugly, but at some point I would have to call it quits and settle with an ugly workaround rather than spend more time on troubleshooting.

Next time I'll be sure to check the documentation earlier in the process and hopefully that will save me from some of the frustration I encountered in this particular scenario.

Should You Use Index Hints?

Watch this week's video on YouTube

One of the things that the SQL Server query optimizer does is determine how to retrieve the data requested by your query.

Usually it does a pretty good job, which is a great because if it didn't then we'd be spending most of our days programming sorting and joining algorithms instead of having fun actually working with our data.

Sometimes the query optimizer has a lapse in judgement and createds a less-than-efficient plan, requiring us to step in and save the day.

Index Hints Give You Control

One way to "fix" a poor performing plan is to use an index hint.  While we normally have no control over how SQL Server retrieves the data we requested, an index hint forces the  query optimizer to use the index specified in the hint to retrieve the data (hence, it's really more of a "command" than a "hint").

Sometimes when I feel like I'm losing control I like using an index hint to show SQL Server who's boss.  I occasionally will also use index hints when debugging poor performing queries because it allows me to confirm whether using an alternate index would improve performance without having to overhaul my code or change any other settings.

...But Sometimes That's Too Much Power

While I like using index hints for short-term debugging scenarios, that's about the only time they should be used because they can create some pretty undesirable outcomes.

For example, let's say I have this nice simple query and index here:

CREATE INDEX IX_OwnerUserId_CreationDate_Includes
ON dbo.Posts (OwnerUserId, CreationDate) INCLUDE (AcceptedAnswerId, ClosedDate, CommentCount, FavoriteCount, LastActivityDate);

SELECT
    OwnerUserId,
    AcceptedAnswerId
FROM
    dbo.Posts
WHERE
    OwnerUserId < 1000

This index was specifically created for a different query running on the Posts table, but it will also get used by the simple query above.

Executing this query without any hints causes SQL Server to use it anyway (since it's a pretty good index for the query), and we get decent performance: only 1002 logical reads.

2018-07-30_12-40-12 I wish all of my execution plans were this simple.

Let's pretend we don't trust the SQL Server optimizer to always choose this index, so instead we force it to use it by adding a hint:

SELECT
    OwnerUserId,
    AcceptedAnswerId
FROM
    dbo.Posts WITH (INDEX(IX_OwnerUserId_CreationDate_Includes))
WHERE
    OwnerUserId < 1000

With this hint, the index will perform exactly the same: 1002 logical reads, a good index seek, etc...

But what happens if in the future a better index gets added to the table?

CREATE INDEX IX_OwnerUserId_AcceptedAnswerId_Includes
ON dbo.Posts (OwnerUserId, AcceptedAnswerId) INCLUDE (LastEditorUserId, ParentId);

If we run the query WITHOUT the index hint, we'll see that SQL Server actually chooses this new index because it's smaller and we can get the data we need in only 522 logical reads:

2018-07-30_12-45-02 This execution plan looks the same, but you'll notice the smaller, more data dense index is being used.

If we had let SQL Server do it's job, it would have given us a great performing query!  Instead, we decided to intervene and hint (ie. force) it to use a sub-optimal index.

Things Can Get Worse

The above example is pretty benign - sure, without the hint SQL Server would have read about half as many pages, but this isn't a drastic difference in this scenario.

What could be disastrous is if because of the hint, the query optimizer decides to make a totally different plan that isn't nearly as efficient.  Or if one day someone drops the hinted index, causing the query with the hint to down right fail:

2018-07-30_12-50-55

Index hints  can be nice to use in the short-term for investigating, testing, and debugging.  However, they are almost never the correct long-term solution for fixing query performance.

Instead, it's better to look for the root-cause of a poor performing query: maybe you need to rebuild stats on an index or determine if the cardinality estimator being used is not ideal.  You might also benefit from rewriting a terribly written query.

Any of these options will likely help you create a better, long-term, flexible solutions rather than forcing SQL Server to use the same hard-coded, potentially sub-optimal index forever.

The Fastest Way To Locate Errors In Your SQL Query

n-120570-2

In about 60 seconds you will never debug error messages in SQL Server Management Studio the same way again.

Coming from a .NET background, I'm used to receiving relatively helpful error messages that point me to the precise location of the error in my code:

NET-Error-Message Pinpoint error finding

SQL Server Management Studio does a decent job too - **except when it doesn't. ** Different versions of SSMS provide error line numbers differently.

Watch this week's video on YouTube

In older versions (<= 2012), SSMS provides error location information relative to the batch starting point of the erroneous query.  This means if you have a window full of T-SQL batch statements, your error message will be relative to the last "GO" in the editor before the syntax error:

Error-message-in-old-SSMS

In SSMS 2014, batch start locations aren't considered:

error-message-ssms-2014

Starting in SSMS 2016, the error messages revert back to counting from the current batch but also indicate the line number where the batch begins:

Error-message-in-newer-ssms

While all of these error messages point me in the right direction, I'm too lazy to try and remember how each version of SSMS refers to the error location.  This becomes particularly problematic when jumping around environments and not being able to use a consistent version of SSMS.

There is one feature that works consistently though that makes navigating to the specific error lines easily.

You can double click on the error message to be taken directly to the line number referenced!

I know this sounds simple, but it's one of those small things I never thought of trying until I saw someone else do it.  Little did I know what I was missing out on - especially in those giant hundreds/thousands of line long scripts.  but now that I know about it it's the only way I locate my query errors.

This functionality also works if you have multiple errors listed in the Messages window.

But what about if your SQL query is all on one giant line (like if it was copied from somewhere or generated dynamically)?  You can use a regular expression to format your query first and then click on the error message to quickly navigate to the correct lines.

C#'s foreach ruined my afternoon

"Forest Fire" by CIFOR is licensed under CC BY-NC-ND 2.0

The other afternoon I ran into some nightmarish debugging with the following code:

private static void StartThreads()
{
    var values = new List<int>() { 1, 2, 3 };
    var threads = new List<Thread>();

    foreach (var value in values)
    {
        Thread t = new Thread(() => Run(value));
        threads.Add(t);
        t.Start();
    }

    // Wait for all threads to complete
    foreach (Thread t in threads)
        t.Join();
}

private static void Run(int value)
{
    Console.Write(value.ToString());
}

(I know, I know, I wish I could be using TPL but in this case I couldn't)

On my local machine, the code above ran and gave me my expected console output of 123 (your results may vary depending on what order the threads run in).

When I ran this code on my server however, the output was 333.

begin pulling out hair

Long story short, after a couple hours of investigation I figured out that the way a foreach loop works under the hood in C# ≥ 5.0, which is what I run on my local machine, works differently than a foreach loop in C# < 5.0, which is what I had on my server.

From the C# 4.0 spec, a foreach loop is really a while loop behind the scenes, meaning the code above really translates into something like this:

private static void StartThreads()
{
    var values = new List<int>() { 1, 2, 3 };
    var threads = new List<Thread>();
    IEnumerator<int> e = ((IEnumerable<int>)values).GetEnumerator();

    try
    {
        int v; // DECLARED OUTSIDE OF THE LOOP!!!
        while (e.MoveNext())
        {
            v = (int)(int)e.Current;
            Thread t = new Thread(() => Run(v));
            threads.Add(t);
            t.Start();
        }
    }
    finally
    {
        if (e != null) ((IDisposable)e).Dispose();
    }

    // Wait for all threads to complete
    foreach (Thread t in threads)
        t.Join();
}

The important thing to note in the above code is that int v gets declared outside of the while loop.

In the C# 5.0 spec, that int v gets declared inside the loop (causing it to get recreated with every iteration):

private static void StartThreads()
{
    var values = new List<int>() { 1, 2, 3 };
    var threads = new List<Thread>();
    IEnumerator<int> e = ((IEnumerable<int>)values).GetEnumerator();

    try
    {
        while (e.MoveNext())
        {
            int v; // C# 5.0 DECLARED INSIDE THE LOOP
            v = (int)(int)e.Current;
            Thread t = new Thread(() => Run(v));
            threads.Add(t);
            t.Start();
        }
    }
    finally
    {
        if (e != null) ((IDisposable)e).Dispose();
    }

    // Wait for all threads to complete
    foreach (Thread t in threads)
        t.Join();
}

Because my local machine and server were running different versions of .NET, the same exact code was producing totally different results.

Eventually I found Eric Lippert's article about the matter. Since I'm still fairly new to the world of .NET, I wasn't around for the big debate that took place in his comment's section regarding which should be the correct implementation. However, it is interesting to note that the C# devs decided to switch the logic on how the foreach loop operates so late in the game.

My eventual workaround for the .NET 3.5/C# 4.0 server was to assign the int to a newly created variable inside the foreach:

foreach (var value in values)
{
    var tempValue = value; // THE FIX
    Thread t = new Thread(() => Run(tempValue));
    threads.Add(t);
    t.Start();
}

As frustrating it may be to debug problems like this, it is nice to learn a little bit more of the language's history and idiosyncrasies.