Forecast Cloudy – Using Azure Blob Storage with Apache Hive on HDInsight

The beauty of working with Big Data in Azure is that you can manage (create\delete) compute resources with your HDInsight cluster independent of your data stored either in Azure Data Lake or Azure blob storage.  In this case I will concentrate on using Azure blob storage\WASB as data store for HDInsWight Azure PaaS Hadoop service

With a typical Hadoop installation you load your data to a staging location then you import it into the Hadoop Distributed File System (HDFS) within a single Hadoop cluster. That data is manipulated, massaged, and transformed. Then you may export some or all of the data back as resultset for consumption by other systems (think PowerBI, Tableau, etc)
Windows Azure Storage Blob (WASB) is an extension built on top of the HDFS APIs. The WASBS variation uses SSL certificates for improved security. It in many ways “is” HDFS. However, WASB creates a layer of abstraction that enables separation of storage. This separation is what enables your data to persist even when no clusters currently exist and enables multiple clusters plus other applications to access a single piece of data all at the same time. This increases functionality and flexibility while reducing costs and reducing the time from question to insight.


In Azure you store blobs on containers within Azure storage accounts. You grant access to a storage account, you create collections at the container level, and you place blobs (files of any format) inside the containers. This illustration from Microsoft’s documentation helps to show the structure:


Hold on, isn’t the whole selling point of Hadoop is proximity of data to compute?  Yes, and just like with any other Hadoop system on premises data is loaded into memory on the individual nodes at compute time. With Azure data infrastructure setup and data center backbone within data center built for performance, your job performance is generally the same or better than if you used disks locally attached to the VMs.

Below is diagram of HDInsight data storage architecture:


HDInsight provides access to the distributed file system that is locally attached to the compute nodes. This file system can be accessed by using the fully qualified URI, for example:


More important is ability access data that is stored in Azure Storage. The syntax is:


As per you need to be aware of following:

  • Container Security for WASB storage.  For containers in storage accounts that are connected to cluster,because the account name and key are associated with the cluster during creation, you have full access to the blobs in those containers. For public containers that are not connected to cluster you have read-only permission to the blobs in the containers.  For private containers in storage accounts that are not connected to cluster , you can’t access the blobs in the containers unless you define the storage account when you submit the WebHCat jobs.
  • The storage accounts that are defined in the creation process and their keys are stored in %HADOOP_HOME%/conf/core-site.xml on the cluster nodes. The default behavior of HDInsight is to use the storage accounts defined in the core-site.xml file. It is not recommended to directly edit the core-site.xml file because the cluster head node(master) may be reimaged or migrated at any time, and any changes to this file are not persisted.

 You can create new or point existing storage account to HDinsight cluster easy via portal as I show below:


You can point your HDInsight cluster to multiple storage accounts as well , as explained here – 

You can also create storage account and container via Azure PowerShell like in this sample:

$SubscriptionID = “<Your Azure Subscription ID>”
$ResourceGroupName = “<New Azure Resource Group Name>”
$Location = “EAST US 2”

$StorageAccountName = “<New Azure Storage Account Name>”
$containerName = “<New Azure Blob Container Name>”

Select-AzureRmSubscription -SubscriptionId $SubscriptionID

# Create resource group
New-AzureRmResourceGroup -name $ResourceGroupName -Location $Location

# Create default storage account
New-AzureRmStorageAccount -ResourceGroupName $ResourceGroupName -Name $StorageAccountName -Location $Location -Type Standard_LRS

# Create default blob containers
$storageAccountKey = (Get-AzureRmStorageAccountKey -ResourceGroupName $resourceGroupName -StorageAccountName $StorageAccountName)[0].Value
$destContext = New-AzureStorageContext -StorageAccountName $storageAccountName -StorageAccountKey $storageAccountKey
New-AzureStorageContainer -Name $containerName -Context $destContext


The URI scheme for accessing files in Azure storage from HDInsight is:


The URI scheme provides unencrypted access (with the wasb: prefix) and SSL encrypted access (with wasbs). Microsoft recommends using wasbs wherever possible, even when accessing data that lives inside the same region in Azure.

The <BlobStorageContainerName> identifies the name of the blob container in Azure storage. The <StorageAccountName> identifies the Azure Storage account name. A fully qualified domain name (FQDN) is required.

I ran into rather crazy little limitation\ issue when working with \WASB and HDInsight. Hadoop and Hive is looking for and  expects a valid folder hierarchy to import data  files, whereas  WASB does not support a folder hierarchy i.e. all blobs are listed under a container. The workaround is to use SSH session to login into head cluster node and use mkdir command line command to manually create such directory via the driver.

The SSH Procedure with HDInsight can be found here –

Another one recommended to me was that “/” character can be used within the key name to make it appear as if a file is stored within a directory structure. HDInsight sees these as if they are actual directories.For example, a blob’s key may be input/log1.txt. No actual “input” directory exists, but due to the presence of the “/” character in the key name, it has the appearance of a file path.

For more see –,

Hope this helps.


Forecast Cloudy–Azure Resource Manager Templates

Azure originally provided only the classic deployment model. In this model, each resource existed independently; there was no way to group related resources together. Instead, you had to manually track which resources made up your solution or application, and remember to manage them in a coordinated approach. To deploy a solution, you had to either create each resource individually through the classic portal or create a script that deployed all of the resources in the correct order. To delete a solution, you had to delete each resource individually. You could not easily apply and update access control policies for related resources. Finally, you could not apply tags to resources to label them with terms that help you monitor your resources and manage billing.

With the introduction of the new Azure Portal  its possible to view resources (such as websites, virtual machines and databases) as a single logical unit. This logical unit is called a resource group. The following screenshot shows a resource group called ecommerce-westus which contains a website, SQL database and an Application Insights instance.



Resource groups aren’t visible in the old portal, but almost everything within your Azure subscription exists within a set of resource groups that were created by default when the preview portal opened for business. If you access the new portal, press the Browse button on the jump bar (the icons down the left hand side of the screen) and navigate to resource groups you should see a whole bunch listed.

The benefit of resource groups is that they allow an Azure administrator to roll-up billing and monitoring information for resources within a resource group and manage access to those resources as a set. This can be extremely useful when you have a single subscription but you need to do cost recovery on the resources used by a customer or internal department.

For more on Azure Resource Manager see  docs – , ,

As an example I created a template in Visual Studio 2015 , this sample with ARM Template and parameters file that creates two Windows Server 2012R2 VM(s) with IIS configured using DSC. It also installs one SQL Server 2014 standard edition VM, a VNET with two subnets, NSG, loader balancer, NATing and probing rules.


To start I used one of the open source templates available on GitHub and just added\modified resource.

So how does one author these templates in VS 2015?

Prerequisites: You will need Visual Studio 2015 installed.  In my case I have VS 2015 Update 1 and Azure Tools SDK V2.8.2.

Create Azure Resource Group  Project.

When you first create an Azure Resource Group Project in Visual Studio you are presented with a list of common templates you can use to start defining your desired infrastructure. For example, if you wanted a set of Virtual Machines that could be used to run a scalable web server workload.


Project structure for Resource Group Project.

Using blank template project results in a project that contains everything you need to begin authoring and eventually deploy your template. As shown below, the project structure contains three folders; Scripts, Templates, and Tools.


The Scripts folder contains the PowerShell script that is used to deploy the environment you describe in the project. This is a fully functional script that essentially requires no editing. It just works! However, if you have a need to modify the script to support some advanced deployment scenarios you can do so.

The Templates folder contains the JSON files that describe the environment you want to create. The azuredeploy.json file is where you describe the resources you want in your environment, such as the storage account, virtual machine, and other required resources. The azuredeploy.parameters.json file is where you can provide parameter values that you want passed into the template during deployment, such as the type of storage account, size of the virtual machine, credentials to use to sign into the virtual machine, and so on.

The Tools folder contains AzCopy.exe and is used in scenarios where your deployment template needs to copy files to a storage account used for deploying application and configuration files into a resource. For a virtual machine, this typically would be Desired State Configuration (DSC) and DSC resources that you want pushed into the virtual machine after the virtual machine is provisioned to do things like configure Active Directory or IIS Web Server roles in the virtual machine. The script in the Scripts folder uses AzCopy in the Tools folder to copy these kinds of artifacts to a storage account so that ARM can then copy them from storage into the virtual machine after the instance is fully provisioned

More information available here – ,

In my case finished VS project structure looks like this


Note that AzureDeploy.json is a JSON template that describes all of the above mentioned infrastructure that I want to deploy in Azure, Deploy-AzureResourceGroup.ps1 is a PowerShell script to connect and deploy it to Azure, there is also parameters files with passwords, subscription names, etc, and finally test project to test such deploy in VS, all part of my test solution in Visual Studio.

So using my test template project  I created all of the above resources in resource group test in Central US data center:


Obviously this is just first baby steps, but should show you what is possible with this technology.  For more on Azure ARM Templates see –,,,

Hope this helps.

Iter in ignotus–Installing SQL Server vNext on Ubuntu Linux 16.10




Over the holidays I had a chance to install and run SQL Server vNext on my Ubuntu Linux machine. I did run into some issues during install, but was able to work around all of those and have SQL Server engine successfully running on Ubuntu 16.10 x64 .

To start out make sure that you are installing SQL Server vNext on Ubuntu Linux version 16 or higher and 64 bit. My first issue I ran into attempting to install on Ubuntu 14.x on 32 bit.

Here are steps to follow in Bash:

  1. Import the public repository GPG keys:
    curl | sudo apt-key add -
  2. Register the Microsoft SQL Server Ubuntu repository:
    curl | sudo tee /etc/apt/sources.list.d/mssql-server.list
  3. Run Update thru apt-get. If you didn’t update you may run into an error later, as I did
    sudo apt-get update
  4. Run actual install via apt-get again
    sudo apt-get install -y mssql-server

    If you are on 32 bit or didnt update source you can see this error

    Unable to locate package mssql 


  5. Now if you in latest Ubuntu x 64 with updated components tou should see install occuring in your terminal window. After the package installation finishes, run the configuration script and follow the prompts.
    sudo /opt/mssql/bin/sqlservr-setup

    Once the configuration is done, verify that the service is running

    systemctl status mssql-server

Now that install is done in 5 easy steps and SQL Services are running on Ubuntu Linux you can use SSMS on Windows to connect to your SQL Server on Linux. But in order to connect from Linux to this instance on Linux I will need to install SQL Client Tools for connectivity stack.


The following steps install the command-line tools, Microsoft ODBC drivers, and their dependencies. The mssql-tools package contains:

  1. sqlcmd: Command-line query utility
  2. bcp: Bulk import-export utility.

Install on Ubuntu in 3 easy steps

  1. Import the public repository GPG keys:
    curl | sudo apt-key add -
  2. Register the Microsoft Ubuntu Repository
    curl | sudo tee /etc/apt/sources.list.d/msprod.list
  3. Update again via apt-get
    sudo apt-get update
  4. Now run actual install
    sudo apt-get install mssql-tools

    If you are on 32 bit or did not update source you can see this error

    Unable to locate package mssql-tools

Now lets connect to our instance on local SQL and run a quick query again all via Terminal

sqlcmd -S localhost -U SA -P ''
SELECT @@version;

Well, that answers what I was doing over the Holidays. Happy bashing to you SQL folks


For more see –,

Meet Redis – Setting Up Redis On Ubuntu Linux


I have been asked by few folks on quick tutorial setting up Redis under systemd in Ubuntu Linux version 16.04.

I have blogged quite a bit about Redis in general – , however just a quick line on Redis in general. Redis is an in-memory key-value store known for its flexibility, performance, and wide language support. That makes Redis one of the most popular key value data stores in existence today. Below are steps to install and configure it to run under systemd in Ubuntu 16.04 and above.

Here are the prerequisites:

Next steps are:

  • Login into your Ubuntu server with this user account
  • Update and install prerequisites via apt-get
             $ sudo apt-get update
             $ sudo apt-get install build-essential tcl
  • Now we can download and exgract Redis to tmp directory
              $ cd /tmp
              $ curl -O
              $ tar xzvf redis-stable.tar.gz
              $ cd redis-stable
  • Next we can build Redis
        $ make
  • After the binaries are compiled, run the test suite to make sure everything was built correctly. You can do this by typing:
       $ make test
  • This will typically take a few minutes to run. Once it is complete, you can install the binaries onto the system by typing:
    $ sudo make install

Now we need to configure Redis to run under systemd. Systemd is an init system used in Linux distributions to bootstrap the user space and manage all processes subsequently, instead of the UNIX System V or Berkeley Software Distribution (BSD) init systems. As of 2016, most Linux distributions have adopted systemd as their default init system.

  • To start off, we need to create a configuration directory. We will use the conventional /etc/redis directory, which can be created by typing
    $ sudo mkdir /etc/redi
  • Now, copy over the sample Redis configuration file included in the Redis source archive:
         $ sudo cp /tmp/redis-stable/redis.conf /etc/redis
  • Next, we can open the file to adjust a few items in the configuration:
    $ sudo nano /etc/redis/redis.conf
  • In the file, find the supervised directive. Currently, this is set to no. Since we are running an operating system that uses the systemd init system, we can change this to systemd:
    . . .
    # If you run Redis from upstart or systemd, Redis can interact with your
    # supervision tree. Options:
    #   supervised no      - no supervision interaction
    #   supervised upstart - signal upstart by putting Redis into SIGSTOP mode
    #   supervised systemd - signal systemd by writing READY=1 to $NOTIFY_SOCKET
    #   supervised auto    - detect upstart or systemd method based on
    #                        UPSTART_JOB or NOTIFY_SOCKET environment variables
    # Note: these supervision methods only signal "process is ready."
    #       They do not enable continuous liveness pings back to your supervisor.
    supervised systemd
    . . .
  • Next, find the dir directory. This option specifies the directory that Redis will use to dump persistent data. We need to pick a location that Redis will have write permission and that isn’t viewable by normal users.
    We will use the /var/lib/redis directory for this, which we will create

    . . .
    # The working directory.
    # The DB will be written inside this directory, with the filename specified
    # above using the 'dbfilename' configuration directive.
    # The Append Only File will also be created inside this directory.
    # Note that you must specify a directory here, not a file name.
    dir /var/lib/redis
    . . .

    Save and close the file when you are finished

  • Next, we can create a systemd unit file so that the init system can manage the Redis process.
    Create and open the /etc/systemd/system/redis.service file to get started:

    $ sudo nano /etc/systemd/system/redis.service
  • The file will should like this, create sections below
    Description=Redis In-Memory Data Store
    ExecStart=/usr/local/bin/redis-server /etc/redis/redis.conf
    ExecStop=/usr/local/bin/redis-cli shutdown
  • Save and close file when you are finished

Now, we just have to create the user, group, and directory that we referenced in the previous two files.
Begin by creating the redis user and group. This can be done in a single command by typing:

$ sudo chown redis:redis /var/lib/redis

Now we can start Redis:

  $ sudo systemctl start redis

Check that the service had no errors by running:

$ sudo systemctl status redis

And Eureka – here is the response

redis.service - Redis Server
   Loaded: loaded (/etc/systemd/system/redis.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2016-05-11 14:38:08 EDT; 1min 43s ago
  Process: 3115 ExecStop=/usr/local/bin/redis-cli shutdown (code=exited, status=0/SUCCESS)
 Main PID: 3124 (redis-server)
    Tasks: 3 (limit: 512)
   Memory: 864.0K
      CPU: 179ms
   CGroup: /system.slice/redis.service
           └─3124 /usr/local/bin/redis-server    

Congrats ! You can now start learning Redis. Connect to Redis CLI by typing

$ redis-cli

Now you can follow these Redis tutorials

Hope this was helpful

Introducing Microsoft Azure Service Fabric – a groundbreaking PaaS Microservices Platform in Microsoft Azure and On Premises


Azure Service Fabric is a distributed systems platform that makes it easy to package, deploy, and manage scalable and reliable microservices. Service Fabric also addresses the significant challenges in developing and managing cloud applications. Developers and administrators can avoid complex infrastructure problems and focus on implementing mission-critical, demanding workloads that are scalable, reliable, and manageable. Service Fabric represents the next-generation middleware platform for building and managing these enterprise-class, tier-1, cloud-scale applications.

That’s great definition, but what does it exactly means? Service Fabric is a base for new type of enterprise Services based application on premises and cloud with built in scalability possibilities, fault tolerance , multi OS deployment and containerization.

There are two main reasons why Service Fabric is able to achieve such massive scalability for applications. The first has to do with application design. Modern, highly scalable applications are generally built around the use of microservices.

The term “microservices” has been thrown around a lot over the last two or three years and has taken on several different meanings. From a Service Fabric prospective, microservices refer to independently deployable services upon which developers can build their applications. Microservices can be almost anything. Some common examples of microservices include protocol gateways, queues, caches, shopping carts, user profiles and inventory processing services.

The other thing that makes it possible for applications to achieve such a large scale when leveraging Service Fabric is the Service Fabric cluster. The individual microservices reside inside containers, and those containers, in turn, are deployed across a Service Fabric cluster. A Service Fabric cluster might contain hundreds of servers and tens of thousands of containers. The Service Fabric can scale to such an extent that Microsoft uses it to run widely used applications such as Cortana, Skype for Business, Microsoft Intune and Power BI.

The foundation of Azure Service Fabric is the System Services that provide the underlying support for customer’s applications.  These services include the following:

  • Failover manager ensures of availability by shifting resources within the cluster including when resources are added or removed.
  • Cluster manager interacts with Failover manager to ensure application and service constraints are not violated.
  • Naming Service provides name resolution services to ensure all services within the application are accessible.  Since the workloads are dynamic, client applications are not expected to understand what underlying infrastructure is supporting a particular service.  The Naming Service will facilitate routing between clients and the underlying service.
  • File store service provides local data and assembly persistence across nodes in the service




Diagram below shows major subsystems of Service Fabric


  • Management Subsystem – manages lifecycle of applications and services
  • · Testability Subsystem – help devs test services through simulated faults before and after deploying applications and services to production
  • · Communication Subsystem – resolve service locations
  • · Reliabilty Subsystem – responsible for reliability through replication, resource management and failover
  • · Hosting and Activation – manages lifecycle of an application on a single node
  • · Application Model – enables tooling
  • · Native and Managed APIs – exposed to devs


On top of the Service Fabric Cluster, customers can deploy two different types of applications including:

  • Stateless where application state is stored out-of- band such as Azure SQL Database or Azure Storage.
  • Stateful where state is replicated to local persistence.  As a result there is a reduction in latency and complexity compared to traditional three tier architectures where developers are typically left to implement state logic themselves.

The Azure Service Fabric platform is responsible for the orchestration of these microservices and the microservices should not have any affinity to a particular node or infrastructure. The following image illustrates how a developer may choose to deploy their application as a series of microservices.  Should a node disappear, it is the responsibility of the Azure Service Fabric platform to ensure the microservice(s) are brought up on a remaining node.



Getting started with Service Fabric is relatively easy. To start off, you’re going to need a PC running a supported OS (Windows 8, Windows 8.1, Windows Server 2012 R2 or Windows 10) and a copy of Microsoft Visual Studio 2015. You’ll use this PC to install the required runtime SDK and to set up a local cluster. If you don’t have a suitable PC or if your Visual Studio licenses are in short supply, then you might consider using an Azure virtual machine. The Azure virtual machine gallery contains the option to deploy a virtual machine that runs Visual Studio 2015 Enterprise.

Once you have Visual Studio 2015 up and running, the next thing you’ll need to do is download and install the runtime components, the SDK and the required tools. You can download these from –

The good news on Service Fabric development front as well is that VSW 2015 already provides you with number of templates to start development. Templates are classified as Reliable Services, Reliable Actors and Web. If your goal is to build an application that’s based on the use of microservices, then you’ll want to choose either a stateless or a stateful reliable service (yes, stateful services are fully supported).

Once you install Service Fabric SDK and Tools on your machine it will install local Service Fabric Cluster as you can see via icon on your taskbar below.You will create and start the cluster and once you can manage it you should be able to see management screen like this



Lets create our first very simple application for Service Fabric. A Service Fabric application can contain one or more services, each with a specific role in delivering the application’s functionality. Create an application project, along with your first service project, using the New Project wizard. You can also add more services later if you want.

  • Launch Visual Studio as an administrator
  • Click File > New Project > Cloud > Service Fabric Application.
  • Name the application and click OK


  • On the next page, choose Stateful as the first service type to include in your application. Name it and click OK.
  • visual Studio will create appropriate project for Service Fabric Stateful Service.



Once you press F5 in Visual Studio this application will be deployed for debugging When the cluster is ready, you get a notification from the local cluster system tray manager application included with the SDK.



If you have added no custom code at all, in the case of the stateful service template, the messages simply show the counter value incrementing in the RunAsync method of MyStatefulService.cs.

Code as you can see is pretty self explanatory


protected override async Task RunAsync(CancellationToken cancellationToken)
            // TODO: Replace the following sample code with your own logic 
            //       or remove this RunAsync override if it's not needed in your service.

            var myDictionary = await this.StateManager.GetOrAddAsync>("myDictionary");

            while (true)

                using (var tx = this.StateManager.CreateTransaction())
                    var result = await myDictionary.TryGetValueAsync(tx, "Counter");

                    ServiceEventSource.Current.ServiceMessage(this, "Current Counter Value: {0}",
                        result.HasValue ? result.Value.ToString() : "Value does not exist.");

                    await myDictionary.AddOrUpdateAsync(tx, "Counter", 0, (key, value) => ++value);

                    // If an exception is thrown before calling CommitAsync, the transaction aborts, all changes are 
                    // discarded, and nothing is saved to the secondary replicas.
                    await tx.CommitAsync();

                await Task.Delay(TimeSpan.FromSeconds(1), cancellationToken);

it’s important to remember that the local cluster is real. Stopping the debugger removes your application instance and unregisters the application type. The cluster continues to run in the background, however. You have several options to manage the cluster:

To shut down the cluster but keep the application data and traces, click Stop Local Cluster in the system tray app.To delete the cluster entirely, click Remove Local Cluster in the system tray app.

In the near future I will blog more about Service Fabric, Reliable Services and finally creating and deploying to cluster in Azure, So stay tuned please.

For more on Service Fabric see below: ,,,

My Great Guardian – Watching Redis With Sentinel




Redis Sentinel provides high availability for Redis. If you ever ran SQL Server mirroring or Oracle Golden Gate the concept should be somewhat familiar to you. To start you need to have Redis replication configured with master and N number slaves. From there, you have Sentinel daemons running, be it on your application servers or on the servers Redis is running on. These keep track of the master’s health.

Redis Sentinel provides high availability for Redis. If you ever ran SQL Server mirroring or Oracle Golden Gate the concept should be somewhat familiar to you. To start you need to have Redis replication configured with master and N number slaves. From there, you have Sentinel daemons running, be it on your application servers or on the servers Redis is running on. These keep track of the master’s health.



How does the failover work? Sentinel actually failover by rewriting configuration (conf) files for Redis instances that are running, I already mentioned SLAVEOF command before –, so by rewriting this command failover is achieved

Say we have a master “A” replicating to slaves “B” and “C”. We have three Sentinels (s1, s2, s3) running on our application servers, which write to Redis. At this point “A”, our current master, goes offline. Our sentinels all see “A” as offline, and send SDOWN messages to each other. Then they all agree that “A” is down, so “A” is set to be in ODOWN status. From here, an election happens to see who is most ahead, and in this case “B” is chosen as the new master.

The config file for “B” is set so that it is no longer the slave of anyone. Meanwhile, the config file for “C” is rewritten so that it is no longer the slave of “A” but rather “B.” From here, everything continues on as normal. Should “A” come back online, the Sentinels will recognize this, and rewrite the configuration file for “A” to be the slave of “B,” since “B” is the current master.

The current version of Sentinel is called Sentinel 2. It is a rewrite of the initial Sentinel implementation using stronger and simpler to predict algorithms (that are explained in this documentation).

A stable release of Redis Sentinel is shipped since Redis 2.8. Redis Sentinel version 1, shipped with Redis 2.6, is deprecated and should not be used.

When configuring Sentinel you need to take time and decide where you want to run Sentinel processes. Many folks recommend running those on your application servers. Presumably if you’re setting this up, you’re concerned about write availability to your master. As such, Sentinels provide insight to whether or not your application server can talk to the master. However a lot of folks decide to run Sentinel processes in their Redis instance servers amd that makes sense as well.

If you are using the redis-sentinel executable (or if you have a symbolic link with that name to the redis-server executable) you can run Sentinel with the following command line:

redis-sentinel /path/to/sentinel.conf

Otherwise you can use directly the redis-server executable starting it in Sentinel mode:

redis-server /path/to/sentinel.conf --sentinel

You have to use configuration file when running Sentinel (sentinel.conf) which is separate from Redis configuration file (redis.conf) and this file this file will be used by the system in order to save the current state that will be reloaded in case of restarts. Sentinel will simply refuse to start if no configuration file is given or if the configuration file path is not writable.

By default , Sentinel listens on TCP port 26379, so for Sentinels to work, port 26379 of your servers must be open to receive connections from the IP addresses of the other Sentinel instances. Otherwise Sentinels can’t talk and can’t agree about what to do, so failover will never be performed.



Some important items to remember on Sentinel

1. You need at least three Sentinel instances for a robust deployment.

2. As per Redis docs, three Sentinel instances should be placed into computers or virtual machines that are believed to fail in an independent way. So for example different physical servers or Virtual Machines executed on different availability zones or application fault domains

3. Sentinel + Redis distributed system does not guarantee that acknowledged writes are retained during failures, since Redis uses asynchronous replication. However there are ways to deploy Sentinel that make the window to lose writes limited to certain moments, while there are other less secure ways to deploy it.

4. You need Sentinel support in your clients. Popular client libraries have Sentinel support, but not all.

5. Test your setup so you know it works. Otherwise you cannot be sure in its performance

Basically. Initial setup expects all nodes running as a master with replication on, with manual set slaveof ip port in redis-cli on futire redis slaves. Then run sentinel and it does the rest.

Minimal redis.conf configuration file looks like this

daemonize yes
pidfile /usr/local/var/run/
port 6379
timeout 0
loglevel notice
logfile /opt/redis/redis.log
databases 1
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename master.rdb
dir /usr/local/var/db/redis/
slave-serve-stale-data yes
slave-read-only no
slave-priority 100
maxclients 2048
maxmemory 256mb
# act as binary log with transactions
appendonly yes
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60

Minimal sentinel.conf configuration file looks like this

port 17700
daemonize yes
logfile "/opt/redis/sentinel.log"
sentinel monitor master 6379 2
sentinel down-after-milliseconds master 4000
sentinel failover-timeout master 180000
sentinel parallel-syncs master 4

Start all of your redis nodes with redis config and choose master. Then run redis console and set all other nodes as a slave of given master, using command slaveof <ip address 6379>

Start all of your redis nodes with redis config and choose master. Then run redis console and set all other nodes as a slave of given master, using command slaveof <ip address 6379>. Then you can connect to your master and verify, if there are all of your slave nodes, connected and syncing – run info command in your master redis console. Output should show you something like this


To test, if your sentinel works, just shutdown your redis master and watch sentinel log. You should see something like this

[17240] 04 Dec 07:56:16.289 # +sdown master master 6379
[17240] 04 Dec 07:56:16.551 # +new-epoch 1386165365
[17240] 04 Dec 07:56:16.551 # +vote-for-leader 185301a20bdfdf1d5316f95bae0fe1eb544edc58 1386165365
[17240] 04 Dec 07:56:17.442 # +odown master master 6379 #quorum 4/2
[17240] 04 Dec 07:56:18.489 # +switch-master master 6379 6379
[17240] 04 Dec 07:56:18.489 * +slave slave 6379 @ master 6379
[17240] 04 Dec 07:56:18.490 * +slave slave 6379 @ master 6379
[17240] 04 Dec 07:56:28.680 * +convert-to-slave slave 6379 @ master 6379

What is also important to note that latest builds on MSOpenStack Redis for Windows have implemented Sentinel as well. As per , You could use the following command line to install a sentinel
instance as a service:

redis-server --service-install --service-name Sentinel1
sentinel.1.conf --sentinel*

In this case the arguments passed to the service instance will be “*sentinel.1.conf

Make sure of following

1. The configuration file must be the last parameter of the command line. If another parameter was last, such as –service-name, it would run fine when invoked the command line but would consistently fail went started as a service.

2. Since the service installs a Network Service by default, ensure that it has access to the directory where the log file will be written.

For more on Sentinel see official Redis docs –,,, ,

Meet Memcached in the Clouds – Setting Up Memcached as a Service via Amazon Elastic Cache



In my previous post I introduced you to Memcached, in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering. This continued my interest in in-memory NoSQL cache systems like AppFabric Cache and Redis.

Both Redis and Memcached are offered by AWS as a cloud PaaS service called ElastiCace. With ElastiCache, you can quickly deploy your cache environment, without having to provision hardware or install software.You can choose from Memcached or Redis protocol-compliant cache engine software, and let ElastiCache perform software upgrades and patch management for you automatically. For enhanced security, ElastiCache runs in the Amazon Virtual Private Cloud (Amazon VPC) environment, giving you complete control over network access to your cache cluster.With just a few clicks in the AWS Management Console, you can add resources to your ElastiCache environment, such as additional nodes or read replicas, to meet your business needs and application requirements.

Existing applications that use Memcached or Redis can use ElastiCache with almost no modification; your applications simply need to know the host names and port numbers of the ElastiCache nodes that you have deployed.The ElastiCache Auto Discovery feature lets your applications identify all of the nodes in a cache cluster and connect to them, rather than having to maintain a list of available host names and port numbers; in this way, your applications are effectively insulated from changes to cache node membership.

Before I show you how to setup AWS ElastiCache cluster lets go through basics:

Data Model

The Amazon ElastiCache data model concepts include cache nodes, cache clusters, security configuration, and replication groups. The ElastiCache data model also includes resources for event notification and performance monitoring; these resources complement the core concepts.

Cache Nodes and Cluster

A cache node is the smallest building block of an ElastiCache deployment. Each node has its own memory, storage and processor resources, and runs a dedicated instance of cache engine software — either Memcached or Redis. ElastiCache provides a number of different cache node configurations for you to choose from, depending on your needs.You can use these cache nodes on an on-demand basis, or take advantage of reserved cache nodes at significant cost savings.

A cache cluster is a collection of one or more cache nodes, each of which runs its own instance of supported cache engine software.You can launch a new cache cluster with a single ElastiCache operation (CreateCacheCluster), specifying the number of cache nodes you want and the runtime parameters for the cache engine software on all of the nodes. Each node in a cache cluster has the same compute, storage and memory specifications, and they all run the same cache engine software (Memcached or Redis).The ElastiCache API lets you control cluster-wide attributes, such as the number of cache nodes, security settings, version upgrades, and system maintenance windows.

Cache parameter groups are an easy way to manage runtime settings for supported cache engine software. Memcached has many parameters to control memory usage, cache eviction policies, item sizes, and more; a cache parameter group is a named collection of Memcached specific parameters that you can apply to a cache cluster

. Memcached clusters contain from 1 to 20 nodes across which you can horizontally partition your data


To create cluster via ElastiCache console follow these steps:

1. Open  the Amazon ElastiCache console at

2. Pick Memcached from Dashboard on the left

3. Choose Create

4. Complete Settings Section



As you enter setting please note following:

1. In the Name enter desired cluster name. Remember, it must begin with the letter and can contain 1 to 20 alphanumeric characters, however cannot have two consecutive hyphens nor end with the hyphen

2. In Port you can accept default at 11211. If you have a reason to use a different port, type the port number.

3. For Parameter group, choose the default parameter group, choose the parameter group you want to use with this cluster, or choose Create new to create a new parameter group to use with this cluster.

4. For Number of nodes, choose the number of nodes you want for this cluster. You will partition your data across the cluster’s nodes.If you need to change the number of nodes later, scaling horizontally is quite easy with Memcached

5. Choose how you want the Availability zone(s) selected for this cluster. You have two options

  1. No Preference. ElastiCache selects availability zone for each node in your cluster
  2. Specify availability zones. Specify availability zone for each node in your cluster.

6. For Security groups, choose the security groups you want to apply to this cluster.

7. The Maintenance window is the time, generally an hour in length, each week when ElastiCache schedules system maintenance for your cluster. You can allow ElastiCache choose the day and time for your maintenance window (No preference), or you can choose the day, time, and duration yourself

8. Now check all of the settings and pick Create


More information on Memcached specific parameters you can set up on your ElastiCache cluster see here –

For clusters running the Memcached engine, ElastiCache supports Auto Discovery—the ability for client programs to automatically identify all of the nodes in a cache cluster, and to initiate and maintain connections to all of these nodes.From the application’s point of view, connecting to the cluster configuration endpoint is no different from connecting directly to an individual cache node

Process of Connecting to Cache Nodes

1. The application resolves the configuration endpoint’s DNS name. Because the configuration endpoint maintains CNAME entries for all of the cache nodes, the DNS name resolves to one of the nodes; the client can then connect to that node.

2. The client requests the configuration information for all of the other nodes. Since each node maintains configuration information for all of the nodes in the cluster, any node can pass configuration information to the client upon request.

3. The client receives the current list of cache node hostnames and IP addresses. It can then connect to all of the other nodes in the cluster.

The configuration information for Auto Discovery is stored redundantly in each cache cluster node. Client applications can query any cache node and obtain the configuration information for all of the nodes in the cluster.

For more information see –,,

Hope this helps.