Window Functions vs GROUP BYs

Published on: 2019-04-16

Watch this week’s episode on YouTube.

There are many options available for improving the performance of a query: indexes, statistics, configuration settings, etc…

However, not all environments allow you to use those features (eg. vendor databases), leaving query rewriting as the only option.

This is the first post in a series to document common ways to refactor queries without otherwise altering the database. The goal of these posts will be to provide examples of performance pitfalls in queries and how to rewrite those queries to generate different query plans that (hopefully) improve performance.

I’ll be using the StackOverflow 2014 data dump for these examples if you want to play along at home.

Who was first to earn each badge?

StackOverflow awards users badges for things like asking good questions, hitting certain vote thresholds, and more.

I want to write a query that figures out who is the first person awarded each badge. In cases where there is a tie for the first person to receive that badge, I want to return the user with the lowest UserId.

Window functions make this type of question easy to write a query for:

    dbo.Badges b

If you’ve used FIRST_VALUE before, this query should be easy to interpret: for each badge Name, return the first UserId sorted by Date (earliest date to receive the badge) and UserId (pick the lowest UserId when there are ties on Date).

This query was easy to write and is simple to understand. However, the performance is not great: it takes 46 seconds to finish returning results on my machine.

Window function execution plan

Note: I assumed this table started off with the following index:

CREATE NONCLUSTERED INDEX IX_Badges__Name_Date_UserId ON [dbo].[Badges] (Name,Date,UserId);

Why so slow?

If we SET STATISTICS IO ON we’ll notice that SQL Server reads 46767 pages from a nonclustered index. Since we aren’t filtering our data, there’s not much we can do to make that faster.

Reading right to left, next up we see two Segment operators. These don’t add much overhead since our data is sorted on our segments/groups, so making SQL Server identify when our sorted rows change values is trivial.

Next up is the Window Spool operator which “Expands each row into the set of rows that represent the window associated with it.” While it looks innocent by having a low relative cost, this operator is writing 8 million rows/reading 16 million rows (because of how Window Spool works) from tempdb. Ouch.

After that the Stream Aggregate operator and Compute Scalar operators check to see if the first value in each window being returned from the Window Spool is null and then return the first non-null value. These operations are also relatively painless since the data flowing through is already sorted.

The Hash Match operator then dedupes the data for our DISTINCT and then we sort the remaining ~2k rows for our output.

So while our query looks simple, the fact that our whole table of data is getting written to and read from tempdb before being deduped and sorted is a real performance killer.

Removing tempdb usage the old-fashioned way

When I say “the old fashioned way”, I mean rewriting our window function to use more traditional aggregate functions and a GROUP BY:

    MIN(b.UserId) AS UserId
    dbo.Badges b
        MIN(Date) AS Date
    ) m
        ON b.Name = m.Name
        AND b.Date = m.Date

I think by most people’s standards, this query is not as easy to read. While not overly complex, it does take up a lot more screen space and is complicated by multiple GROUP BYs and a derived table.

And while the query may look ugly on the outside, it’s what lies below the surface that really matters:

GROUP BY execution plan

What a beautifully simple execution plan. And it finishes executing almost instantly.

Let’s break down what’s going on. First, we start with similar Index Scan and Segment operators as the previous query so no real difference there.

At this point you may have noticed that while the written query uses two GROUP BYs and two MIN functions that are then joined together, there are not two Index Scans, two sets of aggregations, and no join happening in the execution plan.

SQL Server can use an optimization with the Top operator that allows it to take the sorted data and return only the Name and UserId rows for the top Name and Date values within a group (essentially matching the MIN logic). This is a great example of how the optimizer can take a declarative SQL query and decide how to efficiently return the data needs.

At this point, the Top operator filters our 8 million rows down to around 30k rows. 30k rows get deduped a lot faster with our Stream Aggregate operator, and since the data is already sorted we don’t need an extra Sort operator.

Overall, this second query runs so much better than the original because SQL Server doesn’t have to go to tempdb for any operations – all the data is pre-sorted in the index and can flow through.

So I shouldn’t use Window Functions?

Not necessarily – it comes down to a trade offs.

I almost always start with a window function because of how easy they are to write and read. Plus I think they are fun to write as well.

However, if the window function is having to read/write a lot of data to tempdb and it’s affecting the overall performance of your query, a rewrite may be necessary.

In that case, I much rather take more verbose syntax to get a 2000x performance boost.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

9 thoughts on “Window Functions vs GROUP BYs”

  1. Thanks for going through the execution plan. I haven’t even reached novice level with those, so your explanation helped me with understanding a little more what is going on behind the scenes.

  2. Hi Bert, nice decomposition into two GROUP BYs. I don’t have the StackOverflow database. Just wondering how the plan and tempdb usage compares for this SQL:

    WITH cte AS
    SELECT Name,UserId
    FROM cte
    WHERE rn = 1
    ORDER BY Name

    I suspect it may not be as performant as your two GROUP BYs.

  3. Great Post! I read through it and thought, maybe the window function could be salvaged by using a row clause in it’s frame. Unfortunately not. Performance was better but it still took minutes to complete. I am using the older version of the Stack Overflow database. Then I remembered this article that I read a while back from Paul White Titled: Performance Tuning the Whole Query Plan The second part of the article is about performance tuning a query that returns distinct values. I won’t go into details but eventually the query is tuned by using a recursive cte which will seek only the first value and returning that row and moving on to the next value and on and on. I have modified his query keeping the logic in place and plugged in the columns and table from this post. The result was this:

    WITH RecursiveCTE
    — Anchor
    SELECT TOP (1)
    FROM dbo.Badges b


    — Recursive
    — Number the rows
    b.UserId AS UserId,
    rn = ROW_NUMBER() OVER (
    ORDER BY b.Name)
    FROM dbo.Badges b
    JOIN RecursiveCTE AS R
    ON R.Name < b.Name
    ) AS R
    — Only the row that sorts lowest
    R.rn = 1
    UserId AS UserId
    FROM RecursiveCTE

    It looks ugly, but in comparing performance, the double group by completed in 9 seconds for me while the recursive cte returned in less than a second. Below are the statistics io and cpu time differences.

    Double Group By:
    (4543 rows affected)
    Table 'Badges'. Scan count 1, logical reads 139915
    SQL Server Execution Times: CPU time = 9672 ms, elapsed time = 9968 ms.

    Recursive CTE with row numbering:
    (4543 rows affected)
    Table 'Badges'. Scan count 4544, logical reads 18206,
    Table 'Worktable'. Scan count 2, logical reads 27259,
    SQL Server Execution Times: CPU time = 125 ms, elapsed time = 265 ms.

    I don't believe I can post the plan but the plan for the recursive cte looks pretty brutal. I can't explain the details of each operator but I think I can explain it by line. The top line is the anchor and gets the first row and returns it to the index spool which writes to the work table. The second and third lines are the recursive CTE. The second line's table spool grabs the next row from the work table and sends it to the nested loop which pushes down the predicate value to it's inner loop, the third line. The third line seeks data based on the predicate value and returns only one row because of the top operator. This continues until seek operator no longer has any rows to return.

    Again, great post! Keep them coming! I enjoy reading through them and learn a great deal from them as well. Thanks!

    1. Great example James! Thank you for posting it. Paul White always finds amazing ways to make things run faster 🙂 In this case you were able to generate an amazing fast plan, nice!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.