Forecast Cloudy – Azure Blob Storage Introduction

Windows Azure Blob storage can be used to store and retrieve Binary Large Objects (Blobs) , or what we can also call files. There are many reasons why you may want to use this storage mechanism in Azure PaaS – offloading static content from website is most common, but there could be others as well, like sharing PDF files with customers. If using Azure PaaS features such as Web Roles and Worker Roles, storing files in Blob Storage instead of in the application speeds up the publishing time, especially if the files are large.

There are two kinds of blobs – block blobs and page blobs. When you create a blob you will specify its type. Once the blob has been created, its type cannot be changed, and it can be updated only by using operations appropriate for that blob type, i.e., writing a block or list of blocks to a block blob, and writing pages to a page blob. Therefore lets take a look at differences here:

  • Block blobs let you upload large blobs efficiently. Block blobs are comprised of blocks, each of which is identified by a block ID. You create or modify a block blob by writing a set of blocks and committing them by their block IDs. Each block can be a different size, up to a maximum of 4 MB. The maximum size for a block blob is 200 GB, and a block blob can include no more than 50,000 blocks. If you are writing a block blob that is no more than 64 MB in size, you can upload it in its entirety with a single write operation
  • Page blobs are a collection of 512-byte pages optimized for random read and write operations. To create a page blob, you initialize the page blob and specify the maximum size the page blob will grow. To add or update the contents of a page blob, you write a page or pages by specifying an offset and a range that align to 512-byte page boundaries. A write to a page blob can overwrite just one page, some pages, or up to 4 MB of the page blob. Writes to page blobs happen in-place and are immediately committed to the blob. The maximum size for a page blob is 1 TB. Page Blobs are more efficient when ranges of bytes in a file are modified frequently.

When to use which ranges on scenario. However, if you need random read\write access you will use page blob, whereas if you are accessing items sequentially, like sequencing video files, you will use block blob,

Blob Storage itself consists of following components:


  • Storage Account. Self-explanatory. All storage access in Azure is done via storage account
  • Container. A container provides a grouping of a set of blobs. All blobs must be in a container. An account can contain an unlimited number of containers. A container can store an unlimited number of blobs.
  • Blob. A file of any type and size. There are two types I just discussed above.

Blobs are addressable using the following URL format –

http://<storage account><container>/<blob> :

Example from above picture-

Enough of big theory? Well lets start then.

To get started using the BLOB service, we’ll first need to have a Windows Azure account and create a Storage Account. You can get a free trial account here – or if you are using MSDN like me you can add Azure Benefits to your MSDN here – .  Granted you done that we can create Storage Account that will store blobs.

To create a storage account, log in to the Windows Azure management portal and on the large NEW icon at the bottom left hand of the portal. From the expanding menu select the Data Services option, then Storage and finally, Quick Create:



You will now need to provide a name for your storage account in the URL textbox. This name is used as part of the URL for the service endpoint and so it must be globally unique. The portal will indicate whether the name is available whenever you pause or finish typing. Next, you select a location for your storage account by selecting one of the data center locations in the dropdown. This location will be the primary storage location for your data, or more simply, your account will reside in this Data Center. If you have created ‘affinity groups’, which is a friendly name of a collection of services you want to run in the same location, you will also see that in your drop down list. If you have more than one Windows Azure subscriptions related to your login address, you may also see a dropdown list to enable you to select the Azure subscription that the account will belong to.


All storage accounts are stored in triplicate, with transactional-consistent copies in the primary data center. In addition to that redundancy, you can also choose to have ‘Geo Replication’ enabled for the storage account. ’Geo Replication’ means that the Windows Azure Table and BLOB data that you place into the account will not only be stored in the primary location but will also be replicated in triplicate to another data center within the same region.

So account is created:


Click on the ‘Manage Access Keys’ at the bottom of the screen to display the storage account name, which you provided when you created the account, and two 512 bit storage access keys used to authenticate requests to the storage account. Whoever has these keys will have complete control over your storage account short of deleting the entire account. They would have the ability to upload BLOBs, modify table data and destroy queues. These account keys should be treated as a secret in the same way that you would guard passwords or a private encryption key. Both of these keys are active and will work to access your storage account. It is a good practice to use one of the keys for all the applications that utilize this storage account so that, if that key becomes compromised, you can use this dialog to regenerate the key you haven’t been using, then update the all the apps to use that newly regenerated key and finally regenerate the compromised key. This would prevent anyone abusing the account with the compromised key.


Now that we have this account lets do something with it.  Here I will actually create a little console application to work with blob storage.


As I will not have correct reference in such project I will add via NuGet Microsoft Azure Storage Library


Finally here is quick code to upload picture called dev.jpg that I picked in random on my hard drive:

using System;
using System.Collections.Generic;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Blob;
using System.Configuration;
using System.IO;

namespace BlobSample
    class Program
        static void Main(string[] args)
            string accountName = ConfigurationSettings.AppSettings["accountName"];

            string accountKey = ConfigurationSettings.AppSettings["accountKey"]; 


                StorageCredentials creds = new StorageCredentials(accountName, accountKey);

                CloudStorageAccount account = new CloudStorageAccount(creds, useHttps: true);

                CloudBlobClient client = account.CreateCloudBlobClient();

                CloudBlobContainer sampleContainer = client.GetContainerReference("gennadykpictures");


                CloudBlockBlob blob = sampleContainer.GetBlockBlobReference("dev.jpg");

                using (Stream file = System.IO.File.OpenRead("dev.jpg"))




            catch (Exception ex)



            Console.WriteLine("Done... press a key to end.");



Note that I am storing account name and account key that we got from Manage Access Keys page in my App.Config.  To check whether image was uploaded you can use a tool like Azure Explorer from Cerebrata – . As I connect my Azure Explorer to my storage account I clearly see my image under container “gennadykpictures” :


Well, how do I download file programmatically? Similarly you will still use CloudBlockBlob class, but instead of UploadFromStream method you will DownloadToStream, like:

CloudBlockBlob blob = sampleContainer.GetBlockBlobReference("gennadykpictures");

using (Stream outputFile = new FileStream("dev.jpg", FileMode.Create))




When writing solutions for the cloud, you must program defensively. Cloud solutions are often comprised of multiple sometimes-connected products/features that rely on each other to work. Any of those bits could stop working for some reason – Azure websites could go down, the network between the VM and the backend service could start denying access for some reason, the disk your blobs are stored on could hit a bad patch and the controller could be in the process of repointing your main blob storage to one of the replicas. You have to assume any of these things (and more) can happen, and develop your code so that it handles any of these cases.

Obviously my above sample isn’t written that way. I have no retry logic here on failure, nor my error handling is decent. If this was a production application I would have to think of these proper coding and architecture techniques (proper error handling and logging, retry policy on failure, storage throttling in Azure, etc.)

For more on Azure Blob Storage see –

This has been different, fun, and I hope you find it useful.

Save Our Souls -Troubleshooting Heap Corruption the Classic Way with Gflags and Windows Debugger

In my previous post I described my favorite way of troubleshooting unmanaged heap corruption with AppVerifier, DebugDiag and Windows Debugger. However, in a couple of incidents with few customers in last few months that way of getting culprit simply didn’t work for one reason or another, usually due to some overhead from AppVerifier rules. In that case we needed to go  and setup full debug heap option via Gflags. In this post I will quickly show you this classic old method.

First, let me remind you on what is heap corruption and why its so difficult to troubleshoot.

Heap corruption is an undesired change in the data allocated by your program. Its symptoms include:

  • System errors, such as access violations.
  • Unexpected data in program output.
  • Unexpected paths of program execution.

Your program may show a symptom of heap corruption immediately or may delay it indefinitely, depending on the execution path through the program. Important to note again that crashing stack may be just a victim here, not code that actually corrupting stack, but code that simply “touched” the corrupted stack via a heap operation of some sort (allocation via malloc for example).

In general heap in Windows looks like this:


Heap is used for allocating and freeing objects dynamically for use by the program. Heap operations are called for when:

  1. The number and size of objects needed by the program are not known ahead of time.
  2. An object is too large to fit into a stack allocator.

Every process in Windows has one heap called the default heap. Processes can also have as many other dynamic heaps as they wish, simply by creating and destroying them on the fly. The system uses the default heap for all global and local memory management functions, and the C run-time library uses the default heap for supporting malloc functions. The heap memory functions, which indicate a specific heap by its handle, use dynamic heaps.

To debug heap corruption, you must identify both the code that allocated the memory involved and the code that deleted, released, or overwrote it. If the symptom appears immediately, you can often diagnose the problem by examining code near where the error occurred. Often, however, the symptom is delayed, sometimes for hours. In such cases, you must force a symptom to appear at a time and place where you can derive useful information from it.  A common way to do this is for you to command the operating system to insert a special suffix pattern into a small segment of extra memory and check that pattern when the memory is deleted. Another way is for the operating system to allocate extra memory after each allocation and mark it as Protected, which would cause the system to generate an access violation when it was accessed.

So the first tool we will setup is Gflags. Gflags is a heap debug program. Using GFlags, you can establish standard, /full, or /dlls heap options that will force the operating system to generate access violations and corruption errors when your program overwrites heap memory. If you install Debugging Tools for Windows full package from Windows SDK that can be downloaded here – We will use designated debugger option in Gflags when access violation is encountered.  We will setup Windows Debugger (WinDBG) as designated debugger. To do that from command line you will do this:

GFlags /p /enable MyBadProgram.exe /full /debug WinDbg.exe

Substitute MyBadProgram.exe with your application and run it until error occurs.

The error might be an  an Access Violation (most likely), a Memory Check Error, or any other error type severe enough to force the operation system to attach the debugger to the process and bring it up at a breakpoint. Then you can either analyze the error online as in any other debug session, or evaluate the error offline. To evaluate offline create a dump on the fly via:

.dump /ma MyDump.dmp

Click Stop Debugging in the WinDbg toolbar. This will halt the program and empty the debug window.

Turn off heap checking:


GFlags /p /disable MyBadProgram.exe

Now open resulting dump in Windows Debugger and analyze your dump. Here is an example, note I am scrubbing the stack to “protect the innocent”, changing real DLL name to BadDLL:

0:028> kpn
# ChildEBP RetAddr  
00 00a7de34 7763f659 ntdll!RtlReportCriticalFailure(long StatusCode = 0n-1073740940, void * FailureInfo = 0x77674270)+0x57 
01 00a7de44 7763f739 ntdll!RtlpReportHeapFailure(long ErrorLevel = 0n2)+0x21 
02 00a7de78 775ee045 ntdll!RtlpLogHeapFailure(_HEAP_FAILURE_TYPE FailureType = heap_failure_invalid_argument (0n9), void * HeapAddress = 0x003c0000, void * Address = 0x00000001, void * Param1 = 0x00000000, void * Param2 = 0x00000000, void * Param3 = 0x00000000)+0xa1 
03 00a7dea8 76826e6a ntdll!RtlFreeHeap(void * HeapHandle = 0x003c0000, unsigned long Flags = 0, void * BaseAddress = 0x00000001)+0x64
04 00a7debc 76826f54 ole32!CRetailMalloc_Free(struct IMalloc * pThis = 0x769166bc, void * pv = 0x00000001)+0x1c
05 00a7decc 4470189f ole32!CoTaskMemFree(void * pv = 0x00000001)+0x13 
06 00a7ded4 44405f9e BadDLL!ssmem_free+0xf
07 00a7def0 4441ade5 BadDLL+0x5f9e
08 00a7df0c 20e79c9d BadDLL!Init+0x255
09 00a7df84 20e638f6 BadDLL!DllUnregisterServer+0x39bd
0a 00a7dfe4 77639473 BadDLL+0x338f6

Hope this was helpful and until next time. For more on Gflags see – and

Trouble in Distributed Cache Land –Windows AppFabric Cache Timeouts

Recently I had to help some customers troubleshoot periodic performance degradation and timeouts in Windows AppFabric. Example errors these customers would see were:

ErrorCode<ERRCA0018>:SubStatus<ES0001>:The request timed out.

ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure.

So in my earlier post I talked about troubleshooting and monitoring AppFabric Cache cluster.  However I will add an obscure client configuration setting here as well that may be useful in resolving these timeouts.


Client to Server network contention:  Quite well known possibility. Here netstat utility can be very useful. We can start with

netstat -a -n

Knowing that default port is 22233 you can use further switches like:

netstat -a -n | find "TCP " | find /C "TIME_WAIT"

That should give you all of the connections against host port 22233. If we get large numbers in TIME_WAIT state  it means that there is a situation of: port\network contention. The client is trying to establish too many connection yet someone blocks the client from establishing them. To fix client to server connection contention you may modify client configuration, that obscure MaxConnectionsToServer parameter , its 2 by default.

<dataCacheClient requestTimeout=”15000″ channelOpenTimeout=”3000″ maxConnectionsToServer=”5″…>

if you are looking at a high throughput scenario, then increasing this value beyond 1 is recommended. Also, be aware that if you had 5 cache servers in the cluster, if the application uses 3 DataCacheFactories and if maxConnectionsToServer=3, from each client machine there would be 9 outbound TCP connections to each cacheserver, 45 in total across all cache servers. Based on that you may wish to look at increasing that value, but do so carefully as stated above that will increase number of TCP connections and therefore overhead as well. In general with singleton DataCacheFactory (as we recommend) I have seen pretty good results from modest increase.

Hope this helps.

Forecast Cloudy – NoSQL with Azure Table Storage Tutorial

Recently one of my customers started doing a lot more with Azure PaaS, so I started spending a lot more time in Azure as well. Being a data guy in general, one of the most interesting things to me became SQL Azure databases and their scalability via Federations/Sharding approaches , but also Azure PaaS NoSQL data storage and retrieval mechanisms, mainly Azure Table Storage and Blob Storage.

Windows Azure Tables are a non-relational, key-value-pair, storage system suitable for storing massive amounts of unstructured data. Whereas relational stores such as SQL Server, with highly normalized designs, are optimized for storing data so that queries are easy to produce, the non-relational stores like Table Storage are optimized for simple retrieval and fast inserts. They key here is exactly that – simple retrieval. You will not query tables based on complex attributes to return large amounts of data as they are not built for that, instead they are easy way to retrieve a key-value pair record.

Table entities represent the units of data stored in a table and are similar to rows in a typical relational database table. Each entity defines a collection of properties. Each property is key/value pair defined by its name, value, and the value’s data type. Entities must define the following three system properties as part of the property collection:

  • PartitionKey – The PartitionKey property stores string values that identify the partition that an entity belongs to. This means that entities with the same PartitionKey values belong in the same partition. Partitions, as discussed later, are integral to the scalability of the table.
  • RowKey – The RowKey property stores string values that uniquely identify entities within each partition.
  • Timestamp – The Timestamp property provides traceability for an entity. A timestamp is a DateTime value that tells you the last time the entity was modified. A timestamp is sometimes referred to as the entity’s version. Modifications to timestamps are ignored because the table service maintains the value for this property during all inserts and update operations.

The primary key for any database table defines the columns that uniquely define each row. The same is true with Azure tables. The primary key for an Azure table are the PartitionKey and RowKey properties which form a single clustered index within the table. Each PartitionKey and RowKey properties are allowed to store up to 1 KB of string values. Empty strings are also allowed however null values are not. The clustered index is sorted by the PartitionKey in ascending order and then by RowKey also in ascending order. The sort order is observed in all query responses.Partitions represent a collection of entities with the same PartitionKey values. Partitions are always served from one partition server and each partition server can serve one or more partitions. A partition server has a rate limit of the number of entities it can serve from one partition over time. Specifically, a partition has a scalability target of 500 entities per second. This throughput may be higher during minimal load on the storage node, but it will be throttled down when the storage node becomes hot or very active. MSDN has a very good graphical explanation of partition and row key relationship


Because a partition is always served from a single partition server and each partition server can serve one or more partitions, the efficiency of serving entities is correlated with the health of the server. Servers that encounter high traffic for their partitions may not be able to sustain a high throughput. For example, in the MSDN example above, if there are many requests for “2011 New York City Marathon__Full”, server B may become too hot. To increase the throughput of the server, the storage system load-balances the partitions to other servers. The result is that the traffic is distributed across many other servers. For optimal load balancing of traffic, you should use more partitions because it will allow the Azure Table service to distribute the partitions to more partition servers.

So how do I create, insert rows and query from Azure Tables. Lets do a quick walkthrough:

First I will go to my Azure Management Portal and create new storage account. If you don’t have Azure benefits, you can sign up for free trial via – or if you have MSDN subscription from Microsoft Azure benefits are included as well.

Clicking new on bottom left gives you this screen:


You will now need to provide a name for your storage account in the URL textbox. This name is used as part of the URL for the service endpoint and so it must be globally unique. The portal will alert you if you select a name that is already in use. The next step is for you to select a location for your storage account by choosing one of the data center locations in the dropdown. This location will be the primary storage location for your data, or more simply, your account will reside in this Data Center. You may also see an additional dropdown if you have more than one Windows Azure subscription. This allows you to select the subscription that the account will be related to.

Azure storage accounts are stored in three copy, with transactional-consistent copies in the primary data center. In addition to that redundancy, you can also choose to have “Geo Replication’” enabled for the storage account. “Geo Replication” means that the Windows Azure Table data that you place into the account will be replicated in triplicate to another data center within the same region. So, if you select ‘East US’ for your primary storage location, your account will also have a another full copy stored in the West US data center.  Storage accounts that have Geo Replication enabled are referred to as geo redundant storage (GRS) and cost slightly more than accounts that do not have it enabled, which are called locally redundant storage (LRS).

Click on the ‘Manage Access Keys’ at the bottom of the screen to display the storage account name and two 512 bit storage access keys used to authenticate requests to the storage account. Whoever has these keys will have complete control over your storage account short of deleting the entire account. They would have the ability to upload BLOBs, modify table data and destroy queues. These account keys should be treated as a secret in the same way that you would guard passwords or a private encryption key.

Now that we created storage account and have necessary access keys lets create some code that will create Azure Table, connect to it and retrieve data.  We will fire up Visual Studio and code up a quick solution.  I will create new Cloud Services Project with new PaaS Worker Role

Next I will add to my solution NuGet package for Windows Table Storage as well. Right click on Solution and Manage NuGet packages and pick Azure Storage:


Next, we need to setup connection strings using keys that we have from our storage account in our solution.  Using keys that we got earlier we can now create connection string and store it in configuration. You will get something like:

  <add name="StorageConnectionString" 
connectionString="DefaultEndpointsProtocol=https;AccountName=gennadykstorage;AccountKey=somelargecrazystring" /> </connectionStrings>


If you are debugging you code locally in Visual Studio using Storage Emulator you can use following as connection string

setting name="StorageConnectionString" value="UseDevelopmentStorage=true;"



Now lets go to our worker role Run() method and connect to the table storage programmatically, create table that we will store our sample data in: Add following namespaces to your Worker Role class:

using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Table;

Now lets connect:

                CloudStorageAccount storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
                // Create the table client.
                CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

                // Create the table if it doesn't exist.
                CloudTable table = tableClient.GetTableReference("people");

So using our connection string (isn’t it like SQL or Oracle a bit?) we connected to storage account and based on Storage Account created CloudTableClient class , based upon that created a table called “people”, if that table doesn’t already exist.

Next lets add few sample rows. Note that my partition key here is Last Name and row key is First Name. In real world application you wouldn’t use such keys as you probably would choose data where combination of partition Key and row key is unique (see explanations above). That could have been CustomerID, AccountID, SSN#, etc.

In order to work easily with data I will be adding and retrieving I will create CustomerEntity class and inherit from TableEntity , and add it to my Worker Role project:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Table;

namespace WorkerRole1
    class CustomerEntity:TableEntity
        public CustomerEntity() { }

        public CustomerEntity(string lastName, string firstName)
            this.PartitionKey = lastName;
            this.RowKey = firstName;

        public string Email { get; set; }

        public string PhoneNumber { get; set; }


That makes addition of record quite simple:

                // Create a new customer entity.
                CustomerEntity customer1 = new CustomerEntity("Doe", "Jerry");
                customer1.Email = "";
                customer1.PhoneNumber = "425-555-0101";

                // Create the TableOperation that inserts the customer entity.
                TableOperation insertOperation1 = TableOperation.InsertOrReplace(customer1);

                // Execute the insert operation.

                //Create another customer entity
                CustomerEntity customer2 = new CustomerEntity("Smith", "John");
                customer2.Email = "";
                customer2.PhoneNumber = "425-555-0101";

                // Create the TableOperation that inserts the customer entity.
                TableOperation insertOperation2 = TableOperation.InsertOrReplace(customer2);

                // Execute the insert operation.

Looking at Server Explorer in VS I see my table and rows:


Now lets retrieve these rows using partition key and row key. First lets query filtering on a range and throw results via Trace into Azure Diagnostics instead of usual Console.WriteLine


  //retrieve range of records from table where last name greater than A and First Name greater than B

                string filterA = TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, "Smith");
                string filterB = TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, "John");
                string combined = TableQuery.CombineFilters(filterA, TableOperators.And, filterB);
                TableQuery query = new TableQuery().Where(combined); 

                // Loop through the results, displaying information about the Customer entity.
                foreach (CustomerEntity entity in table.ExecuteQuery(query))
                   Trace.WriteLine(" " + entity.PartitionKey.ToString() + " " + entity.RowKey.ToString() + " " +  entity.Email.ToString() + " " +  entity.PhoneNumber.ToString());

If you are retrieving single record it will be even easier:


  //get just one row matching one record

                // Create a retrieve operation that takes a customer entity.
                TableOperation retrieveOperation = TableOperation.Retrieve("Smith", "John");

                // Execute the retrieve operation.
                TableResult retrievedResult = table.Execute(retrieveOperation);

                if (retrievedResult!=null)

Why would I use NoSQL concept like Table Storage in Azure vs. SQL?  In case of Table Storage,  all NoSQL concepts apply:

  • no relations between tables (or entities sets)
  • entities are simply a set of key-value pairs
  • tables are schema-less (i.e. each entity can have a different schema, even in the same table)
  • there is limited support for keys and indexes within tables.

So Azure Table storage is great at storing non-relational key-value pair data. Huge rule I learned while working with it – Never  query tables on properties that are not partition and row key or you will see performance degradation. The idea is that when using Partition and Row Keys, the storage will use its (binary and distributed) index to find results really fast, while when using other entity properties the storage will have to scan everything sequentially, significantly reducing performance. So, querying on the Partition Key is good, querying on both Partition and Row Key is good too, querying only on Row Key is not good (the storage will fall back to sequential scans).

For more see –, and 

This has been different, fun and hope you find it useful.