Gaps and Islands Across Date Ranges

Watch this week's video on YouTube

In a traditional gaps and islands problem, the goal is to identify groups of continuous data sequences (islands) and groups of data where the sequence is missing (gaps).

While many people encounter gaps and islands problems when dealing with ranges of dates, and recently I did too but with an interesting twist:

How do you determine gaps and islands of data that has overlapping date ranges?

Overlapping Date Ranges

First let's start with some sample data to help follow along. The peculiarity to pay attention to is that the date ranges for each row might be completely discrete, completely enclosed, or they may overlap each other on either end:

DROP TABLE IF EXISTS #OverlappingDateRanges;
CREATE TABLE #OverlappingDateRanges (StartDate date, EndDate date);

INSERT INTO #OverlappingDateRanges
SELECT '8/24/2017', '9/23/2017'  UNION ALL
SELECT '8/24/2017', '9/20/2017'  UNION ALL 
SELECT '9/23/2017', '9/27/2017'  UNION ALL 
SELECT '9/25/2017', '10/10/2017' UNION ALL
SELECT '10/17/2017','10/18/2017' UNION ALL 
SELECT '10/25/2017','11/3/2017'  UNION ALL 
SELECT '11/3/2017', '11/15/2017'

SELECT * FROM #OverlappingDateRanges;

2019-03-13_20-13-13

What's unusual about this data is that while the end date of some rows matches the start date of other rows (eg. row 6 and 7), the date ranges of some rows other rows are either fully contained within other rows (eg. row 2 is contained in row 1) while other rows only overlap one boundary (eg. row 4's EndDate doesn't overlap with any other rows, but its StartDate is before row 3's EndDate).

Solution

While there are several ways gaps and islands problems can be solved, here is the solution using window functions that made the most sense to me.

First, we need to create a row number column based on the sequence of start and end dates, as well as bring the previous row's EndDate to the current row:

SELECT
    ROW_NUMBER() OVER(ORDER BY StartDate,EndDate) AS RN,
    StartDate,
    EndDate,
    LAG(EndDate,1) OVER (ORDER BY StartDate, EndDate) AS PreviousEndDate
FROM
    #OverlappingDateRanges

2019-03-06-20-52-58

Next we add two more fields:

  • IslandStartInd: indicates when a new island begins by looking if the current row's StartDate occurs after the previous row's EndDate. We don't really need this column for the example, but I find it helpful to see what's going on in the next column.
  • IslandId: indicates which island number the current row belongs to.
SELECT
    *,
    CASE WHEN Groups.PreviousEndDate >= StartDate THEN 0 ELSE 1 END AS IslandStartInd,
    SUM(CASE WHEN Groups.PreviousEndDate >= StartDate THEN 0 ELSE 1 END) OVER (ORDER BY Groups.RN) AS IslandId
FROM
(
    SELECT
        ROW_NUMBER() OVER(ORDER BY StartDate,EndDate) AS RN,
        StartDate,
        EndDate,
        LAG(EndDate,1) OVER (ORDER BY StartDate, EndDate) AS PreviousEndDate
    FROM
        #OverlappingDateRanges
) Groups

2019-03-06-21-01-51

The IslandId field is just a SUM() of the IslandStartInd, similar to a window function running total.

Our final step is to aggregate our rows using an old fashioned GROUP BY to return the minimum and maximum start and end dates respectively from each of our islands:

SELECT
    MIN(StartDate) AS IslandStartDate,
    MAX(EndDate) AS IslandEndDate
FROM
    (
    SELECT
        *,
        CASE WHEN Groups.PreviousEndDate >= StartDate THEN 0 ELSE 1 END AS IslandStartInd,
        SUM(CASE WHEN Groups.PreviousEndDate >= StartDate THEN 0 ELSE 1 END) OVER (ORDER BY Groups.RN) AS IslandId
    FROM
    (
        SELECT
            ROW_NUMBER() OVER(ORDER BY StartDate,EndDate) AS RN,
            StartDate,
            EndDate,
            LAG(EndDate,1) OVER (ORDER BY StartDate, EndDate) AS PreviousEndDate
        FROM
            #OverlappingDateRanges
    ) Groups
) Islands
GROUP BY
    IslandId
ORDER BY 
    IslandStartDate

2019-03-06-21-03-35

Voilà

Regardless of how messy the date ranges within an island are, this technique neatly identifies gaps in the data and returns the start and end of each island's date range. I often find myself using this technique in scenarios where I need to aggregate some date-based transactional data that otherwise would be too difficult to summarize with aggregate functions alone.

Shortchanged with International Money in SQL Server

Watch this week's video on YouTube

Imagine you have to perform some salary analysis for your employer International Mega Corp.

The data you have to work with looks something like this:

DROP TABLE IF EXISTS ##InternationalMegaCorpSalaries;
GO
CREATE TABLE ##InternationalMegaCorpSalaries
(
    EmployeeId int IDENTITY,
    Salary nvarchar(10),
    Country nvarchar(20),
);
INSERT INTO ##InternationalMegaCorpSalaries VALUES ('1,000.00','United States');
INSERT INTO ##InternationalMegaCorpSalaries VALUES ('8 789,37','Sweden');
INSERT INTO ##InternationalMegaCorpSalaries VALUES ('6.274,26','Turkey');
INSERT INTO ##InternationalMegaCorpSalaries VALUES ('1000.00','United States');

2018-09-21_12-48-56

Why are the salaries stored as nvarchar and formatted with commas, spaces, and periods?

Great question!  Someone wanted to make sure these amounts would look good in the UI so storing the formatted values in the database would be the way to go...

Pretty for the UI, not really great for needing to do analysis on.

Thanks to Zanoni Labuschagne, one of the subscribers to my YouTube channel, for recommending this topic!

CONVERT!

I'm a firm believer that money values should always be stored in the decimal datatype.  I can't think of a time where I wouldn't care about the precision and accuracy of money.

So let's try converting our salaries to decimal(10,2):

SELECT 
    EmployeeId,
    TRY_CONVERT(decimal(10,2),Salary) AS Salary,
    Country 
FROM ##InternationalMegaCorpSalaries

image-3

Well, that didn't work.  Maybe converting to floats will work as a quick fix?

SELECT 
    EmployeeId,
    TRY_CONVERT(float,Salary) AS Salary,
    Country 
FROM ##InternationalMegaCorpSalaries

image-4

Nope that didn't work either

(sidenote: I'm OK with that though - I don't think float should ever be used for storing money.  If you want to see a quick example of why float math is problematic take a look at this (and for more detail read about it here):

DECLARE 
    @Num1 float = .15,
    @Num2 float = .15,
    @Num3 float = .1,
    @Num4 float = .2
-- Not equal
SELECT IIF(@Num1+@Num2 = @Num3+@Num4,1,0)

Ok so those didn't work.  What if we try converting to the money datatype - that should work for being able to read these money formats right?

SELECT 
    EmployeeId,
    TRY_CONVERT(money,Salary) AS Salary,
    Country 
FROM ##InternationalMegaCorpSalaries

image

50% correct...!  Our employees in Turkey are being seriously underpaid with conversion though.  I'm kind of glad to not have to rely on this solution though since the money datatype has its own fair share of problems as well.

Time to get Cultured

SQL Server's TRY_PARSE function might be able to help us, but first we need to create a relationship between each country's money formatting and it's culture code:

ALTER TABLE ##InternationalMegaCorpSalaries
ADD CultureCode AS CASE Country 
                        WHEN 'United States' THEN 'en-US'
                        WHEN 'Sweden' THEN 'sv-SE'
                        WHEN 'Turkey' THEN 'Tr-TR' END

And finally our SELECT query:

SELECT 
    EmployeeId,
    TRY_PARSE(Salary AS DECIMAL(10,2) USING CultureCode) AS Salary,
    Country 
FROM ##InternationalMegaCorpSalaries

image-1

Success!  Our salary values are now perfectly converted into the decimal datatype without the need for any ugly REPLACE(), SUBSTRING(), or other string parsing functions.

While this carefully curated demo correctly converted all of our values, it's important to always test that the culture value you choose correctly formats your string formatted number.  For example, Wikipedia leads me to believe that the  Danes write their numbers like "6 338,70" SQL Server's culture definition doesn't convert this correctly:

SELECT TRY_PARSE('6 338,70' AS DECIMAL(10,2) USING 'da-dk')

image-2

In those instances, you may need to substitute another culture code to get the correct conversion to occur.

Who Stuck These Letters In My DateTimes?

7341d-1tspfzbxx6-gyfexbhclupq

Parsing, creating, and modifying JSON in SQL Server 2016 is really easy. JSON dates and times are not.

Coming from a predominantly SQL background, the JSON DateTime format took some getting used to, especially when it came to converting SQL datetimes to JSON and vice versa.

The remainder of this post will get you well on your way to working with JSON date times in SQL Server.

Breakdown of JSON date/time

In SQL Server, datetime2's format is defined as follows:

YYYY-MM-DD hh:mm:ss[.fractional seconds]

JSON date time strings are defined like:

YYYY-MM-DDTHH:mm:ss.sssZ

Honestly, they look pretty similar. However, there are few key differences:

  • JSON separates the date and time portion of the string with the letter T
  • The Z is optional and indicates that the datetime is in UTC (if the Z is left off, JavaScript defaults to UTC). You can also specify a different timezone by replacing the Z with a + or  along with HH:mm (ie. -05:00 for Eastern Standard Time)
  • The precision of SQL's datetime2 goes out to 7 decimal places, in JSON and JavaScript it only goes out to 3 places, so truncation may occur.

Now that we know the key differences between SQL datetime2 and JSON date time strings, let's explore common transformations when working with JSON data in SQL.

Parsing JSON date time into SQL datetime2

The most common operation I perform with these new JSON functions is parsing, so let's start with those. Let's see how we can parse the date/times from JSON using SQL Server 2016's JSON_VALUE() function:

DECLARE @jsonData nvarchar(max) = N'{ "createDate" : "2017-03-28T12:45:00.000Z" }'

-- SQL's JSON_VALUE() will read in the JSON date time as a string
SELECT JSON_VALUE(@jsonData, '$.createDate')
-- Output: 2017-03-28T12:45:00Z

-- If we want to read it in as a SQL datetime2, we need to use a CONVERT() (or a CAST())
SELECT CONVERT(datetime2, JSON_VALUE(@jsonData, '$.createDate'))
-- Output: 2017-03-28 12:45:00.0000000

-- 7 zeroes after the decimal? Our source only had 3 zeroes!
-- Since JSON/JavaScript times have decimal precision to only 3 places, we need to make
-- the precision of datetime2 match
SELECT CONVERT(datetime2(3), JSON_VALUE(@jsonData, '$.createDate'))
-- Output: 2017-03-28 12:45:00.000

-- So now we are returning our UTC date time from JSON, but what if we need to convert it to a different time zone?
-- Using SQL Server 2016's AT TIME ZONE with CONVERT() will allow us to do that easily.
-- To get a full list of time zone names, you can use SELECT * FROM sys.time_zone_info
SELECT CONVERT(datetime2(3), JSON_VALUE(@jsonData, '$.createDate')) AT TIME ZONE 'Eastern Standard Time'
-- Output: 2017-03-28 12:45:00.000 -04:00

-- What if we just need to grab the date?  Pretty easy, just CONVERT() to date
SELECT CONVERT(date, JSON_VALUE(@jsonData, '$.createDate'))
-- Output: 2017-03-28

--Same with just the time, just remember to use a precision value of 3
SELECT CONVERT(time(3), JSON_VALUE(@jsonData, '$.createDate'))
-- Output: 12:45:00.000

Inserting SQL datetime2 into JSON

Taking date/time data out of JSON and into SQL was pretty easy. What about going the opposite direction and inserting SQL date/time data into JSON?

DECLARE @sqlData datetime2 = '2017-03-28 12:45:00.1234567'

-- Let's first try the simplest SQL to JSON conversion first using FOR JSON PATH
SELECT @sqlData as SQLDateTime2 FOR JSON PATH
-- Output: [{"SQLDateTime2":"2017-03-28T12:45:00"}]

-- Honestly that's not too bad!
-- The datetime gets created in the YYYY-MM-DDTHH:MM:SS.fffffff format
-- Although this is pretty much what we need, what if we want to be explicit and specify that we are in UTC?
-- Just add the AT TIME ZONE modifier and we will get our JSON "Z" indicating UTC
SELECT @sqlData AT TIME ZONE 'UTC' AS SQLDateTime2 FOR JSON PATH
-- Output: [{"SQLDateTime2":"2017-03-28T12:45:00.1234567Z"}]

-- And if we provide a different time zone offset, the JSON is formatted correctly with the +/-HH:MM suffix:
SELECT @sqlData AT TIME ZONE 'Eastern Standard Time' AS SQLDateTime2 FOR JSON PATH
-- Output: [{"SQLDateTime2":"2017-03-28T12:45:00.1234567-04:00"}]

-- You might notice that there are 7 fractional second decimal places in all of the above examples.
-- Although out of JSON spec, this is ok!

-- What if we just want to insert the date?  Just specify with a SQL CONVERT()
SELECT CONVERT(date, @sqlData) as SQLDateTime2 FOR JSON PATH
-- Output: [{"SQLDateTime2":"2017-03-28"}]

-- And the same goes with the time portion
SELECT CONVERT(time, @sqlData) as SQLDateTime2 FOR JSON PATH
-- Output: [{"SQLDateTime2":"12:45:00.1234567"}]

Modifying JSON date time with SQL

So we've seen how easy it is to parse and create JSON date/time strings, but what about modifying JSON data?

DECLARE @sqlDate datetime2 = '2017-03-28 12:45:00.1234567'



DECLARE @jsonData nvarchar(max) = N'{ "createDate" : "2017-03-28T12:45:00.000Z" }'
        ,@newDate datetime2(3) = '2017-03-28T12:48:00.123Z'

-- Let's start out modifying our data by replacing the value completely
SELECT JSON_VALUE(@jsonData, '$.createDate')

-- If we want to pass in a perfectly formatted JSON string, then it's pretty easy
SELECT JSON_MODIFY(@jsonData, '$.createDate', '2017-03-28T12:48:00.123Z')
-- Output: { "createDate" : "2017-03-28T12:48:00.123Z" }

-- If we want to pass in a SQL datetime2 value, say like what we have stored in @newDate, then things get a little messy.
-- The JSON_MODIFY function requires the third argument to be the nvarchar datatype.  This means
-- we need to get our SQL datetime2 into a valid JSON string first.  

-- If we use FOR JSON PATH to create the JSON date from the SQL datetime2, things get ugly because 
-- FOR JSON PATH always creates a property : value combination
SELECT JSON_MODIFY(@jsonData, '$.createDate', (SELECT @newDate as newDate FOR JSON PATH))
-- Output: { "createDate" : [{"newDate":"2017-03-28T12:48:00.123"}] }

-- In order to only pass the JSON datetime into the value for the "createDate" property, we need to 
-- use the CONVERT style number 127 to convert our dateTime to a JSON format
SELECT JSON_MODIFY(@jsonData, '$.createDate', (SELECT CONVERT(nvarchar, @newDate, 127)))
-- Output: { "createDate" : "2017-03-28T12:48:00.123" }

-- But what happened to our "Z" indicating UTC?  
-- We of course need to specify the AT TIME ZONE again:
SELECT JSON_MODIFY(@jsonData, '$.createDate', (SELECT CONVERT(nvarchar, @newDate AT TIME ZONE 'UTC', 127)))
--Output: { "createDate" : "2017-03-28T12:48:00.123Z" }

Overall, working with JSON dates/times is really easy using SQL Server 2016's new JSON functions. Microsoft could have done a really bad job not following the ECMA standards, but they did a great job crossing their T's and placing their Z's.