Forecast Cloudy – NoSQL with Azure Table Storage Tutorial

Recently one of my customers started doing a lot more with Azure PaaS, so I started spending a lot more time in Azure as well. Being a data guy in general, one of the most interesting things to me became SQL Azure databases and their scalability via Federations/Sharding approaches , but also Azure PaaS NoSQL data storage and retrieval mechanisms, mainly Azure Table Storage and Blob Storage.

Windows Azure Tables are a non-relational, key-value-pair, storage system suitable for storing massive amounts of unstructured data. Whereas relational stores such as SQL Server, with highly normalized designs, are optimized for storing data so that queries are easy to produce, the non-relational stores like Table Storage are optimized for simple retrieval and fast inserts. They key here is exactly that – simple retrieval. You will not query tables based on complex attributes to return large amounts of data as they are not built for that, instead they are easy way to retrieve a key-value pair record.

Table entities represent the units of data stored in a table and are similar to rows in a typical relational database table. Each entity defines a collection of properties. Each property is key/value pair defined by its name, value, and the value’s data type. Entities must define the following three system properties as part of the property collection:

  • PartitionKey – The PartitionKey property stores string values that identify the partition that an entity belongs to. This means that entities with the same PartitionKey values belong in the same partition. Partitions, as discussed later, are integral to the scalability of the table.
  • RowKey – The RowKey property stores string values that uniquely identify entities within each partition.
  • Timestamp – The Timestamp property provides traceability for an entity. A timestamp is a DateTime value that tells you the last time the entity was modified. A timestamp is sometimes referred to as the entity’s version. Modifications to timestamps are ignored because the table service maintains the value for this property during all inserts and update operations.

The primary key for any database table defines the columns that uniquely define each row. The same is true with Azure tables. The primary key for an Azure table are the PartitionKey and RowKey properties which form a single clustered index within the table. Each PartitionKey and RowKey properties are allowed to store up to 1 KB of string values. Empty strings are also allowed however null values are not. The clustered index is sorted by the PartitionKey in ascending order and then by RowKey also in ascending order. The sort order is observed in all query responses.Partitions represent a collection of entities with the same PartitionKey values. Partitions are always served from one partition server and each partition server can serve one or more partitions. A partition server has a rate limit of the number of entities it can serve from one partition over time. Specifically, a partition has a scalability target of 500 entities per second. This throughput may be higher during minimal load on the storage node, but it will be throttled down when the storage node becomes hot or very active. MSDN has a very good graphical explanation of partition and row key relationship

image

Because a partition is always served from a single partition server and each partition server can serve one or more partitions, the efficiency of serving entities is correlated with the health of the server. Servers that encounter high traffic for their partitions may not be able to sustain a high throughput. For example, in the MSDN example above, if there are many requests for “2011 New York City Marathon__Full”, server B may become too hot. To increase the throughput of the server, the storage system load-balances the partitions to other servers. The result is that the traffic is distributed across many other servers. For optimal load balancing of traffic, you should use more partitions because it will allow the Azure Table service to distribute the partitions to more partition servers.

So how do I create, insert rows and query from Azure Tables. Lets do a quick walkthrough:

First I will go to my Azure Management Portal and create new storage account. If you don’t have Azure benefits, you can sign up for free trial via – http://azure.microsoft.com/en-us/pricing/free-trial/ or if you have MSDN subscription from Microsoft Azure benefits are included as well.

Clicking new on bottom left gives you this screen:

image

You will now need to provide a name for your storage account in the URL textbox. This name is used as part of the URL for the service endpoint and so it must be globally unique. The portal will alert you if you select a name that is already in use. The next step is for you to select a location for your storage account by choosing one of the data center locations in the dropdown. This location will be the primary storage location for your data, or more simply, your account will reside in this Data Center. You may also see an additional dropdown if you have more than one Windows Azure subscription. This allows you to select the subscription that the account will be related to.

Azure storage accounts are stored in three copy, with transactional-consistent copies in the primary data center. In addition to that redundancy, you can also choose to have “Geo Replication’” enabled for the storage account. “Geo Replication” means that the Windows Azure Table data that you place into the account will be replicated in triplicate to another data center within the same region. So, if you select ‘East US’ for your primary storage location, your account will also have a another full copy stored in the West US data center.  Storage accounts that have Geo Replication enabled are referred to as geo redundant storage (GRS) and cost slightly more than accounts that do not have it enabled, which are called locally redundant storage (LRS).

Click on the ‘Manage Access Keys’ at the bottom of the screen to display the storage account name and two 512 bit storage access keys used to authenticate requests to the storage account. Whoever has these keys will have complete control over your storage account short of deleting the entire account. They would have the ability to upload BLOBs, modify table data and destroy queues. These account keys should be treated as a secret in the same way that you would guard passwords or a private encryption key.

Now that we created storage account and have necessary access keys lets create some code that will create Azure Table, connect to it and retrieve data.  We will fire up Visual Studio and code up a quick solution.  I will create new Cloud Services Project with new PaaS Worker Role

Next I will add to my solution NuGet package for Windows Table Storage as well. Right click on Solution and Manage NuGet packages and pick Azure Storage:

image

Next, we need to setup connection strings using keys that we have from our storage account in our solution.  Using keys that we got earlier we can now create connection string and store it in configuration. You will get something like:

<connectionStrings>
  <add name="StorageConnectionString" 
connectionString="DefaultEndpointsProtocol=https;AccountName=gennadykstorage;AccountKey=somelargecrazystring" /> </connectionStrings>

 

If you are debugging you code locally in Visual Studio using Storage Emulator you can use following as connection string

setting name="StorageConnectionString" value="UseDevelopmentStorage=true;"

 

 

Now lets go to our worker role Run() method and connect to the table storage programmatically, create table that we will store our sample data in: Add following namespaces to your Worker Role class:

using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Table;

Now lets connect:

  //Connect 
                CloudStorageAccount storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
                // Create the table client.
                CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

                // Create the table if it doesn't exist.
                CloudTable table = tableClient.GetTableReference("people");
                table.CreateIfNotExists();

So using our connection string (isn’t it like SQL or Oracle a bit?) we connected to storage account and based on Storage Account created CloudTableClient class , based upon that created a table called “people”, if that table doesn’t already exist.

Next lets add few sample rows. Note that my partition key here is Last Name and row key is First Name. In real world application you wouldn’t use such keys as you probably would choose data where combination of partition Key and row key is unique (see explanations above). That could have been CustomerID, AccountID, SSN#, etc.

In order to work easily with data I will be adding and retrieving I will create CustomerEntity class and inherit from TableEntity , and add it to my Worker Role project:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Table;

namespace WorkerRole1
{
    class CustomerEntity:TableEntity
    {
        public CustomerEntity() { }

        public CustomerEntity(string lastName, string firstName)
        {
            this.PartitionKey = lastName;
            this.RowKey = firstName;
        }


        public string Email { get; set; }

        public string PhoneNumber { get; set; }
    }

}

That makes addition of record quite simple:

                // Create a new customer entity.
                CustomerEntity customer1 = new CustomerEntity("Doe", "Jerry");
                customer1.Email = "jerry.doe@contoso.com";
                customer1.PhoneNumber = "425-555-0101";

                // Create the TableOperation that inserts the customer entity.
                TableOperation insertOperation1 = TableOperation.InsertOrReplace(customer1);

                // Execute the insert operation.
                table.Execute(insertOperation1);

                //Create another customer entity
                CustomerEntity customer2 = new CustomerEntity("Smith", "John");
                customer2.Email = "John.Smith@contoso.com";
                customer2.PhoneNumber = "425-555-0101";

                // Create the TableOperation that inserts the customer entity.
                TableOperation insertOperation2 = TableOperation.InsertOrReplace(customer2);

                // Execute the insert operation.
                table.Execute(insertOperation2);

Looking at Server Explorer in VS I see my table and rows:

image

Now lets retrieve these rows using partition key and row key. First lets query filtering on a range and throw results via Trace into Azure Diagnostics instead of usual Console.WriteLine

 

  //retrieve range of records from table where last name greater than A and First Name greater than B



                string filterA = TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, "Smith");
                string filterB = TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, "John");
                string combined = TableQuery.CombineFilters(filterA, TableOperators.And, filterB);
                TableQuery query = new TableQuery().Where(combined); 

                
                // Loop through the results, displaying information about the Customer entity.
                foreach (CustomerEntity entity in table.ExecuteQuery(query))
                {
                   Trace.WriteLine(" " + entity.PartitionKey.ToString() + " " + entity.RowKey.ToString() + " " +  entity.Email.ToString() + " " +  entity.PhoneNumber.ToString());
                   Trace.Flush();
                }

If you are retrieving single record it will be even easier:

//

  //get just one row matching one record

                // Create a retrieve operation that takes a customer entity.
                TableOperation retrieveOperation = TableOperation.Retrieve("Smith", "John");

                // Execute the retrieve operation.
                TableResult retrievedResult = table.Execute(retrieveOperation);

                if (retrievedResult!=null)
                {
                    Trace.WriteLine((((CustomerEntity)retrievedResult.Result).PartitionKey).ToString());
                    Trace.Flush();
                }

Why would I use NoSQL concept like Table Storage in Azure vs. SQL?  In case of Table Storage,  all NoSQL concepts apply:

  • no relations between tables (or entities sets)
  • entities are simply a set of key-value pairs
  • tables are schema-less (i.e. each entity can have a different schema, even in the same table)
  • there is limited support for keys and indexes within tables.

So Azure Table storage is great at storing non-relational key-value pair data. Huge rule I learned while working with it – Never  query tables on properties that are not partition and row key or you will see performance degradation. The idea is that when using Partition and Row Keys, the storage will use its (binary and distributed) index to find results really fast, while when using other entity properties the storage will have to scan everything sequentially, significantly reducing performance. So, querying on the Partition Key is good, querying on both Partition and Row Key is good too, querying only on Row Key is not good (the storage will fall back to sequential scans).

For more see – http://www.troyhunt.com/2013/12/working-with-154-million-records-on.html, http://blog.tylerdoerksen.com/2013/01/03/table-storage-v2-queries/ and http://blogs.msdn.com/b/jnak/archive/2008/10/28/walkthrough-simple-table-storage.aspx 

This has been different, fun and hope you find it useful.

Advertisements

One thought on “Forecast Cloudy – NoSQL with Azure Table Storage Tutorial

  1. Pingback: Forecast Cloudy – Why Is My Azure Table Storage Query So Slow Again? | A posteriori

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s