Azure SQL vs Azure Table Storage

Published on: 2019-12-03

Watch this week’s episode on YouTube.

A year ago I built an app to keep track of pickup volleyball game scores and payments. It works well, but after a year of regular use it’s time to update it with some improvements.

As part of the update process, I’m rewriting the data layer to use Azure’s NoSQL Table Storage instead of Azure SQL.

Today I’ll walk you through some of the details I considered when deciding to switch to a NoSQL storage solution.

Why Azure NoSQL Table Storage

Cost

Azure SQL has been good to me – it works, it’s familiar, and it’s relatively inexpensive. However, even though the $5/month Azure SQL bill isn’t cost prohibitive (especially with free credits :)), ideally I’d love the app to be able to fund itself via the fractions of a penny that get rounded up as part of the app’s balancing logic.

Azure Table Storage in comparison is dirt cheap. As of the time I’m writing this, it will cost me about $.05/month to store and access my < 1gb of application data.

Another option I considered is Azure SQL Serverless. This is a nice alternative because it would allow me to keep the relational structure of my data, however based on my historical app usage patterns it would definitely end up costing more than Azure Table Storage. Cool service, but not a good fit for my scenario.

Simplicity

Another reason I went with Azure Table Storage is its simplicity.

Azure offers an alternative table storage option called Cosmos DB. Cosmos DB is Azure’s premium table storage offering, but it was overkill for what my app needed.

Yes, Cosmo DB’s global distribution is cool. Yes, its additional indexes on my data is great. However, as I mentioned, cost is the biggest factor in how I’m deciding on a storage solution for this project. And Cosmos DB has lots of great features, but they come at a price.

My preference for this app is to pay more up front in development time to create a better design then rely on a service that does some of that for me at a higher monthly cost. In this case, simple features will accomplish what I need, so that’s what I’m going with.

Design Considerations for Azure Table Storage

When rewriting my app’s data layer, there were several new things that I had to account for in Azure Table Storage.

Primary Key

The primary key in Azure Table Storage is made up of two columns: PartitionKey and RowKey. This composite primary key is also the clustered (and only!) index for the table.

The PartitionKey in particular is important because it determines whether related rows of data will exist on the same underlying server or not. Different PartitionKey values may end up on different servers. This can be a good thing if you factor in parallel access in your design, or a significant bottleneck if all of your data ends up residing on a single server.

Latency

There are two aspects of latency to consider.

The first is how far you are from your Azure region. For me, this isn’t a huge deal since the only users of the app are currently in northeast Ohio, so choosing to store all of the data in the same region is good enough.

The bigger consideration for this app is that with Azure Table Storage, you can’t do any joins with your data. Well, you can join in your app, but you can’t join in Azure itself.

This means that design is critical to reducing latency since joining multiple tables of data in your app will require multiple calls to the Azure Table Storage service. This is not something that is necessarily a deal breaker, just something that needs to be considered, especially if coming from a relational SQL background where you are using JOINs to filter and reduce your data before it is returned to your app.

Row Size Limitations

NoSQL gets a lot hate for its common pattern of storing giant blobs of semi-structured data in a single field (this design makes sense when considering the latency considerations).

However, this becomes a delicate balancing act since each row in Azure Table Storage can only be a maximum of 1mb in size. This causes you to want to fit as much data into a single call as possible (to reduce the number of calls) while also not exceeding the 1mb row size.

Azure Table Storage does allow up to 252 columns of data per table (plus the required PartitionKey, RowKey, and Timestamp columns) so at least your 1mb of data will be organized.

…and more!

The above details were the primary considerations I had to take into account for my specific app. There are things data like data consistency, durability, and more that you may want to take into account based on your app’s goals and usage patterns.

Conclusion

While NoSQL can often break many of the relational concepts we are used to using, it is often the means for achieving the cheapest pay-only-for-what-you-use pricing on cloud providers.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

When a SQL UPDATE Statement DELETES Rows

Published on: 2019-11-26

Watch this week’s episode on YouTube.

At first I wasn’t going to write this post. I thought it would be too simple.

Then I watched two experienced SQL developers go through similar scenarios last week.

Sometimes the simple things are worth revisiting, so…

A SQL Server UPDATE Bug

I received a message from someone stating that when they update a row in their table, the row gets deleted.

Thinking that was strange, I asked the user if they could reproduce the issue. And they did. And after running the update statement, the row disappeared.

“WHAT THE…?”

Reproducing the Issue

So here’s the scenario: we had an SSIS configuration table that looks something like this:

DROP TABLE IF EXISTS ##Configuration;
CREATE TABLE ##Configuration	
(
	ConfigurationFilter nvarchar(255) PRIMARY KEY,
	ConfiguredValue nvarchar(255),
	ConfiguredValueType nvarchar(20)
	-- some other fields
);

If you use SSIS, you might be familiar with this setup. In the table we had some innocuous looking rows:

INSERT INTO ##Configuration VALUES ('AdventureWorks_ETL_Bypass','1','int');
INSERT INTO ##Configuration VALUES ('WideWorldImporters_ETL_Bypass','0','int');
INSERT INTO ##Configuration VALUES ('Northwind_ETL_Bypass','1','int');

Querying a single ConfigurationFilter value returns a single row:

SELECT * FROM ##Configuration WHERE ConfigurationFilter = 'AdventureWorks_ETL_Bypass'
1 record found

Let’s say we now want to update the 1 value to a 0::

UPDATE ##Configuration SET ConfigurationFilter = '0' 
WHERE ConfigurationFilter = 'AdventureWorks_ETL_Bypass'

Then, let’s check to see if our change went through:

SELECT * FROM ##Configuration WHERE ConfigurationFilter = 'AdventureWorks_ETL_Bypass'
no rows!

“WHAT THE …?”

Following the Rules

Do you see the problem?

Of course you do. But in the excitement of the moment, I didn’t see the issue. I thought there was some SQL Server feature taking over that I didn’t understand. Or possibly a bug in how UPDATE works in certain scenarios. How could an UPDATE possibly DELETE a row of data?

Look at the above UPDATE statement again. Our WHERE clause is filtering on the ConfigurationFilter field, which in this case is our table’s primary key; it will only ever return one unique row.

That is until we change the value of that row’s primary key: the SET clause is also updating ConfigurationFilter. This is the mistake. Since ConfigurationFilter has a different value after the update, our original query makes it appear that the row was deleted – when in reality it is now considered a different row based on the way we defined our primary key:

changed it to a 0...whoops

Since there a lot of “Config…” names in this table, the field used in the SET statement should have been ConfiguredValue instead of ConfigurationName. Simple case of updating the wrong field.

Lesson Learned

SQL Server has been thoroughly vetted by running on millions(?) of systems. Bugs do exist, but the chances of you discovering a bug, let alone one that affects such a basic feature such as UPDATE, is very unlikely at this point.

The lesson here is that if you do think you find an issue, go back and check your query: it’s more likely there was an error with the connection between chair and keyboard rather than with the tool itself.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!

First Time Speaking at PASS Summit

Published on: 2019-11-12

This week, I filmed my experience as a first time speaker at PASS Summit in Seattle. No blog post, just a video to relive the experience.

Thanks for reading. You might also enjoy following me on Twitter.

Want to learn even more SQL?

Sign up for my newsletter to receive weekly SQL tips!