Data with Bert logo

Three Changes You Should Always Make Before Checking In Any Code

Photo by Maico Amorim

This story originally appeared in Hacker Noon on March 11, 2017.

A code refactor always leaves me with a feeling of accomplishment. Although major refactorings are the most satisfying, every project doesn't always warrant them. Here are 3 quick and easy refactorings that I make to all of my projects in order to improve code readability:

1. Clean up formatting

The overall format of your code is what makes it possible to quickly navigate to areas of interest. Consistent indentation, line breaks, and patterns help programmers skim large chunks of code. Take the following sloppily formatted code for example:

Inventory inventory = new Inventory();
for (int i = 0; i < cars.Count; i++){
    inventory.Cars.Add(cars[i]);

    var owner = owners.Where(x => x.VIN == cars[i].VIN).OrderByDescending(x => x.PurchaseDate).FirstOrDefault();

inventory.PreviousOwners.Add(new Owner { VIN = cars[i].VIN,
                                        Name = owner.Name});
}

and compare it to this:

Inventory inventory = new Inventory();

for (int i = 0; i < cars.Count; i++)
{
    inventory.Cars.Add(cars[i]);

    var owner = owners.Where(x => x.VIN == cars[i].VIN)
                        .OrderByDescending(x => x.PurchaseDate)
                        .FirstOrDefault();

inventory.PreviousOwners.Add(new Owner 
                                        {
                                        VIN = cars[i].VIN,
                                        Name = owner.Name
                                        });
}

The second example above consistently indents lines, adds new lines, and follows consistent coding patterns. This makes it easy to skim the code quickly.

Books have chapters, headings, and paragraphs defined by formatting that make it easy to find what is needed at a glance — formatting code makes it possible to find things easily too.

2. Rename classes, methods, and variables

Classes, methods, and variables should be named in such that they help the programmer understand what is happening in the code. The shorter the scope of an object the more permissible it is to use shorter names (eg. "i" as a counter in a loop that's only a line or two long).

It's easy to use uninformative names when writing a "first draft" of your program, but using the first name that comes to mind isn't always the best choice. Take a look at the following example:

public IEnumerable<string> GetData(int year)
{
    var result = CallApi("/Cars", year);

IEnumerable<string> output = new IEnumerable<string>();

foreach(var c in result)
    {
        output.Add(c.Make);
    }
}

versus:

public IEnumerable<string> RetrieveCarMakes(int year)
{
    var inventory = CallApi("/Cars", year);

IEnumerable<string> carMakes = new IEnumerable<string>();

    foreach(var car in inventory)
    {
        carMakes.Add(car.Make);
    }
}

Using names that make sense make it much easier for someone else (or your future-self) to figure out what your code is doing.

3. Break up long expressions

When you get into a code writing groove it's easy to keep chaining commands together or using single-line syntax to speed up your writing. Often times, I look back on my code a day later and I see long expressions similar to this. Take a look at this two line expression:

bool hasHighSaleProbability = (daysOnLot < 60) ? true : (color == "Red" ? true : false);

var highSaleProbabilityVehicles = Inventory.Where(x => x.DaysOnLot < 60 or x.Color == "Red").Select(x => new { Make = x.Make, Model = x.Model, Year = x.Year }).Distinct().Select(x => new RecentInventoryView { YearDropdown = new SelectListItem { Text = x.Year, Value = x.Year }, MakeModelDropdown = new SelectListItem { Text = x.Make + " " + x.Model, Value = x.Make + " " + x.Model } });

Compared against this expression that has been broken out across multiple lines:

bool hasHighSaleProbability = false;

if (daysOnLot < 60 || color == "Red")
{
    hasHighSaleProbability = true;
}

var distinctMakesModelsYears = Inventory
    .Where(x => x.DaysOnLot < 60 or x.Color == "Red")
    .Select(x => new 
    { 
    Make = x.Make, 
    Model = x.Model, 
    Year = x.Year 
    })
    .Distinct()
    .ToList();

var recentInventoryView = distinctMakesModelsYears
    .Select(x => new RecentInventoryView 
    { 
        YearDropdown = new SelectListItem 
        { 
        Text = x.Year, 
        Value = x.Year 
        }, 
        MakeModelDropdown = new SelectListItem 
        { 
        Text = x.Make + " " + x.Model, 
        Value = x.Make + " " + x.Model 
        } 
    });

Although the first code snippet is technically shorter and has fewer lines, it is nearly unreadable. The second snippet breaks out the the if logic and breaks up all of the chained methods into more logical variables. The result? Code logic that is much easier to follow.

How to write understandable code for your future self

4de42-19uzocgt2yeq3umchqu9rza

This story originally appeared in Hacker Noon on February 10, 2017.

I can't tell you the number of times the title of this post has crossed my mind as I dug through a piece of code that I hadn't touched in years.

At the time I wrote it, I probably thought my code was beautiful. An elegant masterpiece. It should have been printed, framed, and hung on a wall of The Programming Hall of Fame. As clever as I thought I may have been a few years ago, I rarely am able to read my old code without some serious time wasted debugging.

This problem plagued me regularly. I tried different techniques to try and make my code easier to understand.

First I tried adding comments to my code. Pretty easy to do, but not all that helpful.

When comments weren't cutting it, I tried to write self-documenting code instead: small, well-named classes and methods that described their limited functionality. This made the code more readable but I would still have questions.

I also tried documenting my code in a separate file. This had its benefits but didn't solve the problem entirely either.

Eventually I figured out what I needed to do: I needed to use all three of the above techniques to write truly beautiful and understandable code.

Comments

Comments in your code should document the why, not the how. When I first started programming, I would often write very unhelpful comments like this:

public class Class1
{
  public List<string> DoWork(List<string> a)
  {
    List<string> numbers = new List<string>();

    // Loop over data
    for (int i = 1; i < a.Count; i++)
    {
      int s = a[i].IndexOf(" ");
      string num = a[i].Substring(0,s);

      // Save data
      numbers.Add(num);
    }

    return numbers;
  }
  ...
}

"Loop over data"? "Save data"? These comments are beneficial to understanding the code. I can easily tell that I have a loop, and that I'm adding my data to a collection, why should I waste valuable screen real estate with unhelpful comments?

Instead of saying what or how, comments should explain why. A programmer will see the for loop and know that it's looping over some type of collection of Addresses. However, a programmer will not know why we are starting our counter with int i = 1 — this is where adding a comment can improve the understanding of the code:

// i = 1 because the view will never display the first address
for (int i = 1; i < Addresses.Count; i++) {...

Now, we know some of the business logic driving our app. We know we don't process the first address because it never gets outputted to our view. This comment answers the why behind skipping the first address, adding clarity to the code.

Additionally, we remove the // Save data comment completely since it adds no insightful value.

Self-Documentation

Comments alone won't make code easy to reinterpret however. Let's take at our method again with our improved comments:

public class Class1
{
  public List<string> DoWork(List<string> a)
  {
    List<string> numbers = new List<string>();

    // i = 1 because the view will never display the first address
    for (int i = 1; i < a.Count; i++)
    {
      int s = a[i].IndexOf(" ");
      string num = a[i].Substring(0,s);

      numbers.Add(num);
    }

    return numbers;
  }
  ...
}

What exactly is Class1? What kind of work isDoWork() doing? What about the use of int s? The names of the objects in our code don't aid in our understanding what this code is doing.

This is where the idea of self-documenting code comes in: instead of creating objects with arbitrary, non-informative names ("I swear I'll refactor this later"), we build descriptive objects. If I have a class, its name should give me a good idea about what its properties and methods could be. A method's name should be descriptive enough to tell me what I should expect as an output without having to dig into the details of what that method is doing. Variables should add additional illumination that make what and how type comments obsolete.

In our example, let's make our code self-documenting. First, this class is intended to help us clean address data. Let's call itAddressStandardizer. With that simple renaming we know that all of the properties and methods of this class should pertain to dealing with dirty address data and making it cleaner.

What about the method name List<string> DoWork(List<string> a)? Well , I can tell you that this method is trying to parse out the number portion of a street address. So let's change the method name and signature to something more informative, like List<string> ParseHouseNumbers(List<String> addresses). Now we can make an educated guess that this method accepts some address strings as an input and and it will return a list of parsed house numbers.

If we clean up some variable names, our code becomes much easier to read, like this:

public class AddressStandardizer
{
  public List<string> DoWork(List<string> addresses)
  {
    List<string> houseNumbers = new List<string>();

    // i = 1 because the view will never display the first address
    for (int i = 1; i < addresses.Count; i++)
    {
      int firstSpaceIndex = addresses[i].IndexOf(" ");
      string houseNumber = addresses[i].Substring(0,firstSpaceIndex);

      houseNumbers.Add(houseNumber);
    }

    return houseNumbers;
  }
  ...
}

Documentation

Our code is finally starting to shape up. We have comments explaining why we chose to do something and we refactored our code to have object names that are informative.

The code at this point is ok but not perfect. If we don't look at this code for a few years, we probably have enough information now to look at the code and figure out what it's doing with relative ease.

The big piece of information that we are still missing however is knowing why this code was written in the first place.

Often times, I get a question from a manager or analyst about why we decided to build the project in the first place. Or I'll get a request for information about how the logic in the program works. Without a proper documentation file, the best thing I can do is send the business user a copy of my code. Most of the time that isn't very helpful.

What would be helpful though is an explanation of what our program is doing at a high-level. This is the purpose of formal documentation.

The documentation for this section of code might look something like this:

…After retrieving our customer information from our vendor, the program processes the data and cleans it up to load into our reporting warehouse. Cleaning up the data means parsing the addresses into multiple columns including house number, street name, street suffix, city, state, and postal code…

Now, when a business user needs to know what your program is doing, you can easily send them the above documentation their way. The documentation also acts as a nice refresher for you, the programmer, when it comes time to revisit the code, as well as any future coworkers who will be new to the project.

Wrap up

All of these techniques are necessary to eliminate code headaches down the road. Learn from my experience — not doing all three may save a little bit of time in the short term but it will hurt at some point in the future. Once you get in the habit of writing all three kinds of documentation, it will become second nature and make your life (and the lives of your future-self!) much easier.

Comments in code should explain the why not the how:

  • The how should be explained by well named classes and methods
  • Separate documentation still needed for developers and non-developers alike. Think of business users who need to know how your process works and the business logic that is in it; nice to have high-level document explaining the business uses of your process that someone non-technical can understand.

Will Technology Eliminate Your Tech Job?

9f4ed-1lh0mvkliatliiikt0vlyow

This post is a response to this month's T-SQL Tuesday prompt. T-SQL Tuesday was created by Adam Machanic and is a way for SQL users to share ideas about interesting topics. This month's topic is The Times They Are A-Changing.


I think everyone's had the same fear at some point in their career: "Am I going to lose my job because of X?" X can be a variety of things — company reorganizations, positions being outsourced, robotic automation, new software advancements, etc…

I think the answer to this question depends 100% on the type of individual you are and nothing to do with what your job actually is (was).

Being a Linchpin

Seth Godin discusses the concept of a Linchpin in his same-titled book. A Linchpin is someone who is so good at what they do that they become indispensable to their organization. Linchpins are the kind of people who are self-motivated and are able to consistently deliver quality work. They are integral to the operation of a business, even if they don't get all of the glamour of having VP or Director in their title.

And why are Linchpins always guaranteed jobs? In one scenario, Linchpins will outgrow their role and be promoted or find a better job. They are always learning and growing in addition to delivering, and so this is the natural procession. In the alternate scenario, if the Linchpin has to lose his or her current job (ie. think company buyouts where entire departments close), they will either 1) become promoted to elsewhere in the company because management recognizes their great skills or 2) they will have no problem finding work elsewhere, especially with great recommendations from their former employer.

The Cloud, SaaS, PaaS, and other technologies

The past few years have seen many new technologies come into the SQL professional's workspace. Administrators now have the ability to manage their server instances online in the cloud and use new features and functionality that weren't previously available in local-network only instances. Developers also have new tools to interact with cloud instance, but also have totally new functionality available to them from a variety of online services.

As of now, I think most of these new advancements augment our current technology instead of replace it. I think this means that some professionals will choose to not learn about them or how to use them. And it's really easy to justify not learning them — it can be hard for some to find the time to learn something that they can't immediately use.

However, some professionals will be excited and will learn about these new technologies. Even if their environments don't need to use cloud platforms and other new features, they will find small areas in their environment that can use these technologies so they start getting experience using them. Worst case, even if it's not possible to modify something existing with these new tools, these professionals will create sandboxes for themselves and learn to use some of these technologies anyway. By doing this, they will be more confident in using these tools when the time necessitates that they be used.

When it's time to be promoted or to switch jobs, which of the two professionals is more likely to get hired — the one who knows only his or her old technology really well, or the professional who has taken the time to learn these new features even if they didn't have to use them in their old environment?

Is my role of business intelligence developer going to disappear?

I'm a professional learner. Officially I'm a business intelligence developer, but unofficially I also am a web developer, manager, DBA, and electrical engineer. I don't pretend that I am an expert in all of those unofficial capacities (or even the official one!), but I do continually try to improve myself in all of those roles.

Do I worry about having new technologies replace my current job role? No. I do think the tools I use today will be outdated and replaced at some point in the future though.

I imagine some future version of SSRS will be able to generate the majority of the reporting needed for my database based off metadata. Data will continue to evolve and live in environments other than just SQL Server, making my need for SSIS less important — I'll have to learn other ways to transform data, whether through C#, Python, some cloud querying tool, or all of the above. I'll have to get used to not only using data from databases and flat files, but also mixing in data from APIs and cloud storage. Some of this data will be relational but a lot of it will not.

And all of that sounds exciting! Learning new ways of working with data is a thrill because it means I won't get bored working on the same thing year after year. Sure, 10 years from now new technologies will replace my current job — fortunately for me though, by that point I'll be working with those new technologies.

[Video] JSON Usage and Performance in SQL Server 2016

Using JSON because you are lazy is not a good excuse!

Last night I had the privilege to present to the Ohio North SQL Server User Group about JSON in SQL Server 2016. There was a great crowd present (they laughed at all of my terrible jokes so how can they not be great!?) and I had a wonderful time sharing what I know about JSON.

Below you can find my video recording of the presentation as well as the slides and demo code.

Also worth highlighting is OnTopReplica, an open source piece of software I used that basically does picture-in-picture display of another window on your desktop. You can hear everyone get excited by it at 18 minutes and 30 seconds into the video.

Enjoy the resources below :). These as well as resources from other past presentations are available at https://bertwagner.com/presentations/ .

Video

https://youtu.be/r2xFIsdSJ2Q

Slides

https://www.slideshare.net/BertWagner/json-usage-and-performance-in-sql-server-2016

Demo Scripts

https://bertwagner.com/presentations/JSON%20in%20SQL%20Server%202016%20-%20Bert%20Wagner.zip

Who Stuck These Letters In My DateTimes?

7341d-1tspfzbxx6-gyfexbhclupq

Parsing, creating, and modifying JSON in SQL Server 2016 is really easy. JSON dates and times are not.

Coming from a predominantly SQL background, the JSON DateTime format took some getting used to, especially when it came to converting SQL datetimes to JSON and vice versa.

The remainder of this post will get you well on your way to working with JSON date times in SQL Server.

Breakdown of JSON date/time

In SQL Server, datetime2's format is defined as follows:

YYYY-MM-DD hh:mm:ss[.fractional seconds]

JSON date time strings are defined like:

YYYY-MM-DDTHH:mm:ss.sssZ

Honestly, they look pretty similar. However, there are few key differences:

  • JSON separates the date and time portion of the string with the letter T
  • The Z is optional and indicates that the datetime is in UTC (if the Z is left off, JavaScript defaults to UTC). You can also specify a different timezone by replacing the Z with a + or  along with HH:mm (ie. -05:00 for Eastern Standard Time)
  • The precision of SQL's datetime2 goes out to 7 decimal places, in JSON and JavaScript it only goes out to 3 places, so truncation may occur.

Now that we know the key differences between SQL datetime2 and JSON date time strings, let's explore common transformations when working with JSON data in SQL.

Parsing JSON date time into SQL datetime2

The most common operation I perform with these new JSON functions is parsing, so let's start with those. Let's see how we can parse the date/times from JSON using SQL Server 2016's JSON_VALUE() function:

DECLARE @jsonData nvarchar(max) = N'{ "createDate" : "2017-03-28T12:45:00.000Z" }'

-- SQL's JSON_VALUE() will read in the JSON date time as a string
SELECT JSON_VALUE(@jsonData, '$.createDate')
-- Output: 2017-03-28T12:45:00Z

-- If we want to read it in as a SQL datetime2, we need to use a CONVERT() (or a CAST())
SELECT CONVERT(datetime2, JSON_VALUE(@jsonData, '$.createDate'))
-- Output: 2017-03-28 12:45:00.0000000

-- 7 zeroes after the decimal? Our source only had 3 zeroes!
-- Since JSON/JavaScript times have decimal precision to only 3 places, we need to make
-- the precision of datetime2 match
SELECT CONVERT(datetime2(3), JSON_VALUE(@jsonData, '$.createDate'))
-- Output: 2017-03-28 12:45:00.000

-- So now we are returning our UTC date time from JSON, but what if we need to convert it to a different time zone?
-- Using SQL Server 2016's AT TIME ZONE with CONVERT() will allow us to do that easily.
-- To get a full list of time zone names, you can use SELECT * FROM sys.time_zone_info
SELECT CONVERT(datetime2(3), JSON_VALUE(@jsonData, '$.createDate')) AT TIME ZONE 'Eastern Standard Time'
-- Output: 2017-03-28 12:45:00.000 -04:00

-- What if we just need to grab the date?  Pretty easy, just CONVERT() to date
SELECT CONVERT(date, JSON_VALUE(@jsonData, '$.createDate'))
-- Output: 2017-03-28

--Same with just the time, just remember to use a precision value of 3
SELECT CONVERT(time(3), JSON_VALUE(@jsonData, '$.createDate'))
-- Output: 12:45:00.000

Inserting SQL datetime2 into JSON

Taking date/time data out of JSON and into SQL was pretty easy. What about going the opposite direction and inserting SQL date/time data into JSON?

DECLARE @sqlData datetime2 = '2017-03-28 12:45:00.1234567'

-- Let's first try the simplest SQL to JSON conversion first using FOR JSON PATH
SELECT @sqlData as SQLDateTime2 FOR JSON PATH
-- Output: [{"SQLDateTime2":"2017-03-28T12:45:00"}]

-- Honestly that's not too bad!
-- The datetime gets created in the YYYY-MM-DDTHH:MM:SS.fffffff format
-- Although this is pretty much what we need, what if we want to be explicit and specify that we are in UTC?
-- Just add the AT TIME ZONE modifier and we will get our JSON "Z" indicating UTC
SELECT @sqlData AT TIME ZONE 'UTC' AS SQLDateTime2 FOR JSON PATH
-- Output: [{"SQLDateTime2":"2017-03-28T12:45:00.1234567Z"}]

-- And if we provide a different time zone offset, the JSON is formatted correctly with the +/-HH:MM suffix:
SELECT @sqlData AT TIME ZONE 'Eastern Standard Time' AS SQLDateTime2 FOR JSON PATH
-- Output: [{"SQLDateTime2":"2017-03-28T12:45:00.1234567-04:00"}]

-- You might notice that there are 7 fractional second decimal places in all of the above examples.
-- Although out of JSON spec, this is ok!

-- What if we just want to insert the date?  Just specify with a SQL CONVERT()
SELECT CONVERT(date, @sqlData) as SQLDateTime2 FOR JSON PATH
-- Output: [{"SQLDateTime2":"2017-03-28"}]

-- And the same goes with the time portion
SELECT CONVERT(time, @sqlData) as SQLDateTime2 FOR JSON PATH
-- Output: [{"SQLDateTime2":"12:45:00.1234567"}]

Modifying JSON date time with SQL

So we've seen how easy it is to parse and create JSON date/time strings, but what about modifying JSON data?

DECLARE @sqlDate datetime2 = '2017-03-28 12:45:00.1234567'



DECLARE @jsonData nvarchar(max) = N'{ "createDate" : "2017-03-28T12:45:00.000Z" }'
        ,@newDate datetime2(3) = '2017-03-28T12:48:00.123Z'

-- Let's start out modifying our data by replacing the value completely
SELECT JSON_VALUE(@jsonData, '$.createDate')

-- If we want to pass in a perfectly formatted JSON string, then it's pretty easy
SELECT JSON_MODIFY(@jsonData, '$.createDate', '2017-03-28T12:48:00.123Z')
-- Output: { "createDate" : "2017-03-28T12:48:00.123Z" }

-- If we want to pass in a SQL datetime2 value, say like what we have stored in @newDate, then things get a little messy.
-- The JSON_MODIFY function requires the third argument to be the nvarchar datatype.  This means
-- we need to get our SQL datetime2 into a valid JSON string first.  

-- If we use FOR JSON PATH to create the JSON date from the SQL datetime2, things get ugly because 
-- FOR JSON PATH always creates a property : value combination
SELECT JSON_MODIFY(@jsonData, '$.createDate', (SELECT @newDate as newDate FOR JSON PATH))
-- Output: { "createDate" : [{"newDate":"2017-03-28T12:48:00.123"}] }

-- In order to only pass the JSON datetime into the value for the "createDate" property, we need to 
-- use the CONVERT style number 127 to convert our dateTime to a JSON format
SELECT JSON_MODIFY(@jsonData, '$.createDate', (SELECT CONVERT(nvarchar, @newDate, 127)))
-- Output: { "createDate" : "2017-03-28T12:48:00.123" }

-- But what happened to our "Z" indicating UTC?  
-- We of course need to specify the AT TIME ZONE again:
SELECT JSON_MODIFY(@jsonData, '$.createDate', (SELECT CONVERT(nvarchar, @newDate AT TIME ZONE 'UTC', 127)))
--Output: { "createDate" : "2017-03-28T12:48:00.123Z" }

Overall, working with JSON dates/times is really easy using SQL Server 2016's new JSON functions. Microsoft could have done a really bad job not following the ECMA standards, but they did a great job crossing their T's and placing their Z's.