Forecast Cloudy – Why Is My Azure Table Storage Query So Slow Again?

Perhaps this post shouldn’t exist as I already profiled basics of Azure Table Storage in my previous post. However, in last few months I heard quite a bit of complaints from customers attempting to query Azure tables of performance. In sheer majority of the cases the issues could have been avoided with proper table and query design. By design, Azure Table services provides the potential to store enormous amounts of data, while enabling efficient access and persistence. The services simplify storage, saving you from jumping through all the hoops required to work with a relational database—constraints, views, indices, relationships and stored procedures. Yet, lest not forget that Azure Table Storage is no way to substitute for relational database – so don’t try to store relational data there, attempt some sort of joins, manage relationships of any kind, etc. Its designed for “quick and dirty” NoSQL storage and access not for analytics. With that established lets again look at design of Azure Table Storage:

Azure Storage Partitions. Looking at quite known Microsoft white paper on Azure Storage, one of the main reasons why storage can scale so well is its partitioning logic. Whenever you store something on Windows Azure storage, it is located on some partition in the system. Partitions are used for scale out in the system.

Referenced Screen

Those Partition Keys are critical. In Table Storage, you have to decide on the Partition Key yourself. In essence, you are responsible for the throughput you’ll get on your system. If you put every entity in the same partition (by using the same partition key), you’ll be limited to the size of the storage machines for the amount of storage you can use. Plus, you’ll be constraining the maximal throughput as there’s lots of entities in the same partition.Should you set the Partition Key to a unique value for every entity stored? No. You can do this and every entity stored will end up in its own partition, but you’ll find that querying your data becomes more difficult.

Row Key – a primary key within given partition. A Row Key in Table Storage is a very simple thing: it’s your “primary key” within a partition. Partition Key + Row Key form the composite unique identifier for an entity. Within one Partition Key, you can only have unique Row Keys. If you use multiple partitions, the same Row Key can be reused in every partition.

Consequences.  For example, consider a products table that contains information about company products and has Partition Keys that correspond to the product types, such as Bicycle,  Tricycle and Scooter. As demand for Bicycles increases and more models\variations are introduced , the rows in the Bicycle partition might be very busy (becoming a so-called “hot” partition). The service can load balance the Products table by moving the Bicycle partition to a different server to 
better handle the many requests made to the partition. If you anticipate more activity on that partition than a single server can handle, you should consider creating more-granular partitions , example Bicycle_Adult and Bicycle_Children.

Before building your code, it’s a good idea to think about both properties. Don’t just assign them a GUID or a random string as it does matter for performance.The fastest way of querying? Specifying both Partition Key and Row Key. By doing this, table storage will immediately know which partition to query and can simply do an ID lookup on Row Key within that partition.

Very good guide to designing your Azure Table Storage is available here – https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/ , its way deeper than one can ever go in this blog post. Another good resource is Azure Storage Team Blog and check out excellent post here – http://blogs.msdn.com/b/mast/archive/2013/02/19/improving-windows-azure-table-storage-query-performance.aspx

The performance difference can be rather striking. To illustrate I will do something pretty bad here – will create a table with Partition Key of “People” (therefore will always use single partition)  and GUID Row Key with lots of records (100,000+) and query my table based on non-Partition\Row Key columns.  Here is the code to create 100,000 of fake people in Azure Table that way from console application:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Table;
using System.Configuration;

namespace GoodAzureStorageQuery
{
    class Program
    {
        static void Main(string[] args)
        {
            //Connect 
            CloudStorageAccount storageAccount = CloudStorageAccount.Parse(ConfigurationManager.ConnectionStrings["StorageConnectionString"].ToString());

                //Create Table Clienbt
                CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

                // Create the table if it doesn't exist.
                CloudTable table = tableClient.GetTableReference("People");
                table.CreateIfNotExists();

                //Fill up that table 
            for (int i=0;i<100000;i++)
            {
                Person p = new Person(i.ToString(), (i + 1).ToString());
                // Create the TableOperation that inserts the customer entity.
                TableOperation insertOperation = TableOperation.InsertOrReplace(p);
                table.Execute(insertOperation);
            }
        }
    }
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Table;

namespace GoodAzureStorageQuery
{
    class Person:TableEntity
    {
        public string FirstName {get;set;}
        public string LastName { get; set; }

    public Person(string first_name, string last_name)
        {
            Guid g = Guid.NewGuid();
            this.PartitionKey = "People";
            this.RowKey = g.ToString();
            this.LastName=last_name;
            this.FirstName = first_name; 
            
        }
    }
}
}

And sure thing – connecting to Azure from ClumsyLeaf Table explorer I see an table with lots of records in no particular order, as I “designed”:

image

Same in Azure Storage Explorer:

image

The issue occurs when I query this table via something like to query for single person in my table:

var query = from entity in context.CreateQuery("SpecificCustomer")
                 where entity.LastName.Equals("1")
                 && entity.FirstName.Equals("2")
                 select entity;

Or something like this:

var query = (from entity in context.CreateQuery("Top10People")
                 select entity).Take(10);

 

That will take a long time. Again, here I committed multiple sins:

  • Used generic Partition Key – therefore not partitioning my table well at all, but throwing all data on a single partition
  • Used GUID for Row Key
  • Used a query that queries neither on Partition or Row Key

So high level rules to make sure your Azure Tables are performant  are:

    • No relations between tables (or entities sets) – this isn’t an RDBMS
    • Storing datasets that don’t require complex joins, foreign keys, or stored procedures and can be denormalized for fast access
    • Quickly querying data using a clustered index (Partition Key + Row Key)
    • Tables doesn’t have a specified schema
    • Query tables using both Partition and Row Key to achieve best performance with Windows Azure Table Storage.

For more see – http://lockmedown.com/be-sure-with-azure-net-azure-table-storage-part-2/, http://blogs.msdn.com/b/windowsazurestorage/archive/2014/11/17/azure-storage-performance-checklist.aspx, https://msdn.microsoft.com/en-us/magazine/ff796231.aspx

Hope this helps.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s