Data Cargo In The Clouds – Data and Containerization On The Cloud Platforms

container

This topic is where I actually planned to pivot to for quite a while, but either had no bandwith\time with moving to Seattle or some other excuse like that. This topic is an interesting to me as its intersection of technologies where I used to spend quite a bit of time including:

  • Data, including both traditional RDBMS databases and NoSQL designs
  • Cloud, including Microsoft Azure, AWS and GCP
  • Containers and container orchestration (that is area new to me)

Those who worked with me either displayed various degrees of agreement, but more often of annoyance, on me pushing this topic as growing part of data engineering discipline, but I truly believe in it. But before we start combining all these areas and do something useful in code I will spend some time here with most basic concepts of what I am about to enter here.

I will skip introduction to basic concepts of RDBMS, NoSQL or BigData\Hadoop technologies, as if you didn’t hide under the rock for last five years, you should be quite aware of those. That brings us straight to containers:

As definition states – “A container image is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings. Available for both Linux and Windows based apps, containerized software will always run the same, regardless of the environment. Containers isolate software from its surroundings, for example differences between development and staging environments and help reduce conflicts between teams running different software on the same infrastructure.”

Containers are a solution to the problem of how to get software to run reliably when moved from one computing environment to another. This could be from a developer’s laptop to a test environment, from a staging environment into production, and perhaps from a physical machine in a data center to a virtual machine in a private or public cloud.

container.jpg

But why containers if I already have VMs? 

VMs take up a lot of system resources. Each VM runs not just a full copy of an operating system, but a virtual copy of all the hardware that the operating system needs to run. This quickly adds up to a lot of RAM and CPU cycles. In contrast, all that a container requires is enough of an operating system, supporting programs and libraries, and system resources to run a specific program.
What this means in practice is you can put two to three times as many as applications on a single server with containers than you can with a VM.  In addition, with containers you can create a portable, consistent operating environment for development, testing, and deployment. 

On Linux, containers run on top of LXC. This is a userspace interface for the Linux kernel containment features. It includes an application programming interface (API) to enable Linux users to create and manage system or application containers.

Docker is an open platform tool to make it easier to create, deploy and to execute the applications by using containers.

  • The heart of Docker is Docker engine. The Docker engine is a part of Docker which create and run the Docker containers. The docker container is a live running instance of a docker image. Docker Engine is a client-server based application with following components :
    • A server which is a continuously running service called a daemon process.
    • A REST API which interfaces the programs to use talk with the daemon and give instruct it what to do.
    • A command line interface client.

             docker_engine 

  • The command line interface client uses the Docker REST API to interact with the        Docker daemon through using CLI commands. Many other Docker applications    also use the API and CLI. The daemon process creates and manage Docker images, containers, networks, and volumes.
  • Docker client is the primary service using which Docker users communicate with the Docker. When we use commands “docker run” the client sends these commands to dockerd, which execute them out.

You will build Docker images using Docker and deploy these into what are known as Docker registries. When we run the docker pull and docker run commands, the required images are pulled from our configured registry directory.Using Docker push command, the image can be uploaded to our configured registry directory. 

Finally deploying instances of images from your registry you will deploy containers. We can create, run, stop, or delete a container using the Docker CLI. We can connect a container to more than one networks, or even create a new image based on its current state.By default, a container is well isolated from other containers and its system machine. A container defined by its image or configuration options that we provide during to create or run it.

For more info on containers , LXC and Docker see – https://blog.scottlowe.org/2013/11/25/a-brief-introduction-to-linux-containers-with-lxc/, https://www.docker.com/what-container#/virtual_machines http://searchservervirtualization.techtarget.com/definition/container-based-virtualization-operating-system-level-virtualization

That brings us to container orchestration engines. While the CLI meets the needs of managing one container on one host, it falls short when it comes to managing multiple containers deployed on multiple hosts. To go beyond the management of individual containers, we must turn to orchestration tools. Orchestration tools extend lifecycle management capabilities to complex, multi-container workloads deployed on a cluster of machines.

Some of the well known orchestration engines include:

Kubernetes.  Almost a standard nowdays , originally developed at Google. Kubernetes’ architecture is based on a master server with multiple minions. The command line tool, called kubecfg, connects to the API endpoint of the master to manage and orchestrate the minions.  The service definition, along with the rules and constraints, is described in a JSON file. For service discovery, Kubernetes provides a stable IP address and DNS name that corresponds to a dynamic set of pods. When a container running in a Kubernetes pod connects to this address, the connection is forwarded by a local agent (called the kube-proxy) running on the source machine to one of the corresponding backend containers.Kubernetes supports user-implemented application health checks. These checks are performed by the kubelet running on each minion to ensure that the application is operating correctly.

kub

For more see – https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/#kubernetes-is

Apache Mesos. This is an open source cluster manager that simplifies the complexity of running tasks on a shared pool of servers. A typical Mesos cluster consists of one or more servers running the mesos-master and a cluster of servers running the mesos-slave component. Each slave is registered with the master to offer resources. The master interacts with deployed frameworks to delegate tasks to slaves. Unlike other tools, Mesos ensures high availability of the master nodes using Apache ZooKeeper, which replicates the masters to form a quorum. A high availability deployment requires at least three master nodes. All nodes in the system, including masters and slaves, communicate with ZooKeeper to determine which master is the current leading master. The leader performs health checks on all the slaves and proactively deactivates any that fail.

 

Over the next couple of posts I will create containers running SQL Server and other data stores, both RDBMS and NoSQL, deploy these on the cloud and finally attempt to orchestrate these hopefully as well.  So lets move off theory and pictures into world of parctical data engine deployments.

Hope you will find this detour interesting.

Advertisements

Meet Memcached in the Clouds – Setting Up Memcached as a Service via Amazon Elastic Cache

aws

 

In my previous post I introduced you to Memcached, in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering. This continued my interest in in-memory NoSQL cache systems like AppFabric Cache and Redis.

Both Redis and Memcached are offered by AWS as a cloud PaaS service called ElastiCace. With ElastiCache, you can quickly deploy your cache environment, without having to provision hardware or install software.You can choose from Memcached or Redis protocol-compliant cache engine software, and let ElastiCache perform software upgrades and patch management for you automatically. For enhanced security, ElastiCache runs in the Amazon Virtual Private Cloud (Amazon VPC) environment, giving you complete control over network access to your cache cluster.With just a few clicks in the AWS Management Console, you can add resources to your ElastiCache environment, such as additional nodes or read replicas, to meet your business needs and application requirements.

Existing applications that use Memcached or Redis can use ElastiCache with almost no modification; your applications simply need to know the host names and port numbers of the ElastiCache nodes that you have deployed.The ElastiCache Auto Discovery feature lets your applications identify all of the nodes in a cache cluster and connect to them, rather than having to maintain a list of available host names and port numbers; in this way, your applications are effectively insulated from changes to cache node membership.

Before I show you how to setup AWS ElastiCache cluster lets go through basics:

Data Model

The Amazon ElastiCache data model concepts include cache nodes, cache clusters, security configuration, and replication groups. The ElastiCache data model also includes resources for event notification and performance monitoring; these resources complement the core concepts.

Cache Nodes and Cluster

A cache node is the smallest building block of an ElastiCache deployment. Each node has its own memory, storage and processor resources, and runs a dedicated instance of cache engine software — either Memcached or Redis. ElastiCache provides a number of different cache node configurations for you to choose from, depending on your needs.You can use these cache nodes on an on-demand basis, or take advantage of reserved cache nodes at significant cost savings.

A cache cluster is a collection of one or more cache nodes, each of which runs its own instance of supported cache engine software.You can launch a new cache cluster with a single ElastiCache operation (CreateCacheCluster), specifying the number of cache nodes you want and the runtime parameters for the cache engine software on all of the nodes. Each node in a cache cluster has the same compute, storage and memory specifications, and they all run the same cache engine software (Memcached or Redis).The ElastiCache API lets you control cluster-wide attributes, such as the number of cache nodes, security settings, version upgrades, and system maintenance windows.

Cache parameter groups are an easy way to manage runtime settings for supported cache engine software. Memcached has many parameters to control memory usage, cache eviction policies, item sizes, and more; a cache parameter group is a named collection of Memcached specific parameters that you can apply to a cache cluster

. Memcached clusters contain from 1 to 20 nodes across which you can horizontally partition your data

ElastiCache-Clusters

To create cluster via ElastiCache console follow these steps:

1. Open  the Amazon ElastiCache console at https://console.aws.amazon.com/elasticache/.

2. Pick Memcached from Dashboard on the left

3. Choose Create

4. Complete Settings Section

 

memcached2

As you enter setting please note following:

1. In the Name enter desired cluster name. Remember, it must begin with the letter and can contain 1 to 20 alphanumeric characters, however cannot have two consecutive hyphens nor end with the hyphen

2. In Port you can accept default at 11211. If you have a reason to use a different port, type the port number.

3. For Parameter group, choose the default parameter group, choose the parameter group you want to use with this cluster, or choose Create new to create a new parameter group to use with this cluster.

4. For Number of nodes, choose the number of nodes you want for this cluster. You will partition your data across the cluster’s nodes.If you need to change the number of nodes later, scaling horizontally is quite easy with Memcached

5. Choose how you want the Availability zone(s) selected for this cluster. You have two options

  1. No Preference. ElastiCache selects availability zone for each node in your cluster
  2. Specify availability zones. Specify availability zone for each node in your cluster.

6. For Security groups, choose the security groups you want to apply to this cluster.

7. The Maintenance window is the time, generally an hour in length, each week when ElastiCache schedules system maintenance for your cluster. You can allow ElastiCache choose the day and time for your maintenance window (No preference), or you can choose the day, time, and duration yourself

8. Now check all of the settings and pick Create

memcached3

More information on Memcached specific parameters you can set up on your ElastiCache cluster see here – http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/ParameterGroups.Memcached.html

For clusters running the Memcached engine, ElastiCache supports Auto Discovery—the ability for client programs to automatically identify all of the nodes in a cache cluster, and to initiate and maintain connections to all of these nodes.From the application’s point of view, connecting to the cluster configuration endpoint is no different from connecting directly to an individual cache node

Process of Connecting to Cache Nodes

1. The application resolves the configuration endpoint’s DNS name. Because the configuration endpoint maintains CNAME entries for all of the cache nodes, the DNS name resolves to one of the nodes; the client can then connect to that node.

2. The client requests the configuration information for all of the other nodes. Since each node maintains configuration information for all of the nodes in the cluster, any node can pass configuration information to the client upon request.

3. The client receives the current list of cache node hostnames and IP addresses. It can then connect to all of the other nodes in the cluster.

The configuration information for Auto Discovery is stored redundantly in each cache cluster node. Client applications can query any cache node and obtain the configuration information for all of the nodes in the cluster.

For more information see – http://cloudacademy.com/blog/amazon-elasticache/, http://www.allthingsdistributed.com/2011/08/amazon-elasticache.html, https://www.sitepoint.com/amazon-elasticache-cache-on-steroids/

Hope this helps.

Forecast Cloudy – SQL Server on Amazon Web Services RDS Limitations, Backups, HA and DR

aws_logo

Continuing on SQL Server on Amazon Web Services RDS journey that I started on previous post. In that post I covered basics of RDS, creating SQL Server instance in RDS and connecting to it. In this post I would like to cover additional topics that are near and dear to any RDBMS DBA, including backups\restores, HADR and talk about limitations of SQL Server on RDS.

Limitations for SQL Server in Amazon Web Services RDS. When you are setting up SQL Server in RDS you need to be aware of following:

  • You only get SQL Server basic RDBMS Engine Services. SQL Server comes in with large number of components in addition to basic database services, These include Analysis, Reporting and Integration Services, Master Data Services, Distributed Relay, etc.  As you setup SQL Server on premises you pick components that you need. With RDS all you can host is DB Engine Service,   So if your application architecture involves running SQL Server instances with SSAS or SSRS, those components will have to be hosted elsewhere: this can be either an on-premise server within your network or an EC2 instance in the Amazon cloud. As an architectural best practice, it would make sense to host them in EC2.
  • Although you can login from SQL Management Studio (SSMS) there is no Remote Desktop option available that I could find.
  • Very limited file access.
  • Size limitations. AFAIK the minimum size of a SQL Server RDS instance for Standard or Enterprise Edition is 200 GB.and max is 4 TB. Beyond 4 TB you would have create another instance and probably implement some sort of sharding methodology.
  • Another limitation to be aware of is the number of SQL Server databases an RDS instance can host. It’s only 30 per instance.
  • There are no high availability options available. So no replication, no log shipping, no AlwaysOn and no manual configuration of database mirroring. Mirroring is enabled for all databases if you are opting for a Multi-AZ rollout. The secondary replica hosts the mirrored databases.
  • No distributed transaction support with MSDTC
  • No FileStream, CDC, SQL Audit , Policy Based Management, etc.

So if I have no AlwaysOn  Availability Groups or cluster how would HADR work?

image

Amazon RDS provides high availability and failover support for DB instances using Multi-AZ deployments. Multi-AZ deployments for Oracle, PostgreSQL, MySQL, and MariaDB DB instances use Amazon technology, while SQL Server DB instances use SQL Server Mirroring. In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to a standby replica to provide data redundancy, eliminate I/O freezes, and minimize latency spikes during system backups.

The RDS console shows the Availability Zone of the standby replica (called the secondary AZ), or you can use the command rds-describe-db-instances or the API action DescribeDBInstances to find the secondary AZ. When using the BYOL licensing model, you must have a license for both the primary instance and the standby replica.

In the event of a planned or unplanned outage of your DB instance, Amazon RDS automatically switches to a standby replica in another Availability Zone if you have enabled Multi-AZ. The time it takes for the failover to complete depends on the database activity and other conditions at the time the primary DB instance became unavailable. Failover times are typically 60-120 seconds. However, large transactions or a lengthy recovery process can increase failover time. The failover mechanism automatically changes the DNS record of the DB instance to point to the standby DB instance.

Backups and Restores. Backup and restore has always been the easiest way of migrating SQL Server databases. Unfortunately this option is not available for RDS so DBAs have to fall back on the manual process of creating database schemas and importing data. You can also use the Database Import Export Wizard in SQL Server to move your data. In general Amazon defines has more on that procedure here – http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/SQLServer.Procedural.Importing.html

Interestingly you can also use SQLAzureMW tool http://sqlazuremw.codeplex.com/ to move schema and data from SQL Server to RDS.

mssql-to-rds-1[1]

That allows you to pick source objects to migrate:

mssql-to-rds-5[1]

Then you connect to destination and transfer schema and data: To me that’s much easier than scripting and transferring schema and using bcp separately on data.

For more info see – http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.SQLServer.CommonDBATasks.htmlhttps://youtu.be/7t2-95NDfBU, http://www.nimbo.com/blog/microsoft-sql-server-high-availability-ha-cloud/

Forecast Cloudy – SQL Server on Amazon Web Services RDS

aws_logo

Amazon Relational Database Service (Amazon RDS) is a web service that makes it easier to set up, operate, and scale a relational database in the cloud. It provides cost-efficient, resizeable capacity for an industry-standard relational database and manages common database administration tasks.

The basic building block of Amazon RDS is the DB instance. A DB instance is an isolated database environment in the cloud.  A DB instance can contain multiple user-created databases, and you can access it by using the same tools and applications that you use with a stand-alone database instance. You can create and modify a DB instance by using the Amazon RDS command line interface, the Amazon RDS API, or the AWS Management Console.

Each DB instance runs a DB engine. Amazon RDS currently supports the MySQL, PostgreSQL, Oracle, and Microsoft SQL Server DB engines. As per my background I will illustrate running DB instance of SQL Server in RDS here, but in the future I may venture further touching MySQL and especially its specialized Amazon cousin known as Aurora. The computation and memory capacity of a DB instance is determined by its DB instance class. You can select the DB instance that best meets your needs. If your needs change over time, you can change DB instances. For each DB instance, you can select from 5 GB to 3 TB of associated storage capacity, instance storage comes in three types: Magnetic, General Purpose (SSD), and Provisioned IOPS (SSD). They differ in performance characteristics and price, allowing you to tailor your storage performance and cost to the needs of your database.

Amazon cloud computing resources are housed in highly available data center facilities in different areas of the world (for example, North America, Europe, or Asia). Each data center location is called a region.Each region contains multiple distinct locations called Availability Zones, or AZs. Each Availability Zone is engineered to be isolated from failures in other Availability Zones, and to provide inexpensive, low-latency network connectivity to other Availability Zones in the same region. By launching instances in separate Availability Zones, you can protect your applications from the failure of a single location. You can run your DB instance in several Availability Zones, an option called a Multi-AZ deployment. When you select this option, Amazon automatically provisions and maintains a synchronous standby replica of your DB instance in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to the standby replica to provide data redundancy, failover support, eliminate I/O freezes, and minimize latency spikes during system backups.

Amazon RDS supports DB instances running several editions of Microsoft SQL Server 2008 R2 and SQL Server 2012.  Amazon RDS currently supports Multi-AZ deployments for SQL Server using SQL Server Mirroring as a high-availability, failover solution. Amazon also supports TDE  (Transparent Database Encryption) feature in SQL Server , as well as allows SSL connections to your SQL Server instance as necessary. In order to deliver a managed service experience, Amazon RDS does not provide shell access to DB instances, and it restricts access to certain system procedures and tables that require advanced privileges. Amazon RDS supports access to databases on a DB instance using any standard SQL client application such as Microsoft SQL Server Management Studio. Amazon RDS does not allow direct host access to a DB instance via Telnet, Secure Shell (SSH), or Windows Remote Desktop Connection. When you create a DB instance, you are assigned to the db_owner role for all databases on that instance, and you will have all database-level permissions except for those that are used for backups (Amazon RDS manages backups for you).

Obviously before you proceed you need to sign up for AWS account. You can do so here – https://aws.amazon.com/. Next you should create IAM (Identity and Access Management) user.

image

Once you log into AWS Console and go to IAM, as above.

  • In the navigation pane, choose Groups, and then choose Create New Group
  • For Group Name, type a name for your group, such as RDS_Admios, and then choose Next Step.
  • In the list of policies, select the check box next to the AdministratorAccess policy. You can use the Filter menu and the Search box to filter the list of policies.
  • Choose Next Step, and then choose Create Group.

Your new group is listed under Group Name after you are done. Next create a user and add user to the group. I covered creating IAM users in my previous blogs, but here it is again:

  • In the navigation pane, choose Users, and then choose Create New Users
  • In box 1, type a user name. Clear the check box next to Generate an access key for each user. Then choose Create.
  • In the Groups section, choose Add User to Groups
  • Select the check box next to your newly created admins group. Then choose Add to Groups.
  • Scroll down to the Security Credentials section. Under Sign-In Credentials, choose Manage Password. Select Assign a custom password. Then type a password in the Password and Confirm Password boxes. When you are finished, choose Apply.

Next in AWS Console I will select RDS from Data Section

image

I will pick SQL Server from available DB Instance types.

image

Picking SQL Server Standard for sake of example I am presented with choice of Multi-Availability Zone deployment for HADR and consistent throughput storage, For the sake of this quick post I will do what you will not do for production and decline that option: After all doing things properly does cost money:

image

Next screen opens a more interesting Pandora’s box here – licensing. I can license my SQL Server here via “bring your own license” or use Amazon license, Customers using License Mobility through Software Assurance Plan to launch bring-your-own-license Amazon RDS for SQL Server instances must complete a license verification process with Microsoft . That option opens following checkbox

image

Here also I can pick my DB Instance Class. Instance classes are detailed in Amazon docs here – http://docs.amazonaws.cn/en_us/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.html. I am picking a very small testing instance , what Amazon calls micro instance size – an instance sufficient for testing that should not be used for production applications.

image

After adding some more info including Availability Group, Backup preferences and retention I can hit Create Instance button.

image

Once Amazon RDS provisions your DB instance, you can use any standard SQL client application to connect to the instance. In order for you to connect, the DB instance must be associated with a security group containing the IP addresses and network configuration that you will use to access the DB instance. So lets connect to my new DB instance using SSMS:

  • On the Instances page of the AWS Management Console, select the arrow next to the DB instance to show the instance details. Note the server name and port of the DB instance, which are displayed in the Endpoint field at the top of the panel, and the master user name, which is displayed in the Username field in the Configuration Details section
  • Use that instance string and your user name to connect.

image

I did run into issue with VPC Security groups and Inbound rules. See this post for proper way to setup rules – https://msayem.wordpress.com/2014/10/23/unable-to-connect-to-aws-rds-from-sql-server-management-studio/

For more see – https://www.mssqltips.com/sqlservertip/3251/running-sql-server-databases-in-the-amazon-cloud-part-1/, http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_SQLServer.html

Forecast Cloudy – Working with AWS DynamoDB in .NET

DynamoDB_Logo

Amazon DynamoDB is AWS primary NoSQL data storage offering. Announced on January 18, 2012, it is a fully managed NoSQL database service that provides fast and predictable performance along with excellent scalability. DynamoDB differs from other Amazon services by allowing developers to purchase a service based on throughput, rather than storage. Although the database will not scale automatically, administrators can request more throughput and DynamoDB will spread the data and traffic over a number of servers using solid-state drives, allowing predictable performance. It offers integration with Hadoop via Elastic MapReduce.

image

The above diagram shows how Amazon offers its various cloud services and where DynamoDB is exactly placed. AWS RDS is relational database as a service over Internet from Amazon while Simple DB and DynamoDB are NoSQL database as services. Both SimpleDB and DynamoDB are fully managed, non-relational services. DynamoDB is build considering fast, seamless scalability, and high performance. It runs on SSDs to provide faster responses and has no limits on request capacity and storage. It automatically partitions your data throughout the cluster to meet the expectations while in SimpleDB we have storage limit of 10 GB and can only take limited requests per second. Also in SimpleDB we have to manage our own partitions. So depending upon your need you have to choose the correct solution.

As I already went through basics of getting started with AWS .NET SDK in my previous post I will not go through it here. Instead there are following basics to consider:

  • The AWS SDK for .NET provides three programming models for communicating with DynamoDB: the low-level model, the document model, and the object persistence model
  • The low-level programming model wraps direct calls to the DynamoDB service.You access this model through the Amazon.DynamoDBv2 namespace. Of the three models, the low-level model requires you to write the most code.
  • The document programming model provides an easier way to work with data in DynamoDB. This model is specifically intended for accessing tables and items in tables.You access this model through the Amazon.DynamoDBv2.DocumentModel namespace.  Compared to the low-level programming model, the document model is easier to code against DynamoDB data. However, this model doesn’t provide access to as many features as the low-level programming model. For example, you can use this model to create, retrieve, update, and delete items in tables.
  • The object persistence programming model is specifically designed for storing, loading, and querying NET objects in DynamoDB.You access this model through the Amazon.DynamoDBv2.DataModel namespace.

We will start by navigating to AWS console and creating a table.

image

For the sake of easy tutorial I will create a table customer with couple of fields. Here I have to think about design of my table a little, in particular around hash and range keys. In DynamoDB concept of “Hash and Range Primary Key” means that a single row in DynamoDB has a unique primary key made up of both the hash and the range key.

  • Hash Primary Key – The primary key is made of one attribute, a hash attribute. For example, a ProductCatalog table can have ProductID as its primary key. DynamoDB builds an unordered hash index on this primary key attribute. This means that every row is keyed off of this value. Every row in DynamoDB will have a required, unique value for this attribute. Unordered hash index means what is says – the data is not ordered and you are not given any guarantees into how the data is stored. You won’t be able to make queries on an unordered index such as Get me all rows that have a ProductID greater than X. You write and fetch items based on the hash key. For example, Get me the row from that table that has ProductID X. You are making a query against an unordered index so your gets against it are basically key-value lookups, are very fast, and use very little throughput.
  • Hash and Range Primary Key – The primary key is made of two attributes. The first attribute is the hash attribute and the second attribute is the range attribute. For example, the forum Thread table can have ForumName and Subject as its primary key, where ForumName is the hash attribute and Subject is the range attribute. DynamoDB builds an unordered hash index on the hash attribute and a sorted range index on the range attribute.This means that every row’s primary key is the combination of the hash and range key. You can make direct gets on single rows if you have both the hash and range key, or you can make a query against the sorted range index. For example, get Get me all rows from the table with Hash key X that have range keys greater than Y, or other queries to that affect. They have better performance and less capacity usage compared to Scans and Queries against fields that are not indexed

In my case for my very simplistic example here I will use simple unique Hash Primary Key:

image

After finishing Create Table Wizard I can now see my table in the console.

image

Better yet, I can  now easily see and modify my table in AWS Explorer in VS, add\remove indexes, etc.:

image

Few words on indexing. A quick question: while writing a query in any database, keeping the primary key field as part of the query (especially in the where condition) will return results much faster compared to the other way. Why? This is because of the fact that an index will be created automatically in most of the databases for the primary key field. This the case with DynamoDB also. This index is called the primary index of the table. There is no customization possible using the primary index, so the primary index is seldom discussed. DynamoDB has concept of two types of secondary indexes:

  • Local Secondary Indexes- an index that has the same hash key as the table, but a different range key. A local secondary index is “local” in the sense that every partition of a local secondary index is scoped to a table partition that has the same hash key
  • Global Secondary Indexes – an index with a hash and range key that can be different from those on the table. A global secondary index is considered “global” because queries on the index can span all of the data in a table, across all partitions

Local Secondary Indexes consume throughput from the table. When you query records via the local index, the operation consumes read capacity units from the table. When you perform a write operation (create, update, delete) in a table that has a local index, there will be two write operations, one for the table another for the index. Both operations will consume write capacity units from the table. Global Secondary Indexes have their own provisioned throughput, when you query the index the operation will consume read capacity from the table, when you perform a write operation (create, update, delete) in a table that has a global index, there will be two write operations, one for the table another for the index*.

Local Secondary Indexes can only be created when you are creating the table, there is no way to add Local Secondary Index to an existing table, also once you create the index you cannot delete it.Global Secondary Indexes can be created when you create the table and added to an existing table, deleting an existing Global Secondary Index is also allowed

Important Documentation Note: In order for a table write to succeed, the provisioned throughput settings for the table and all of its global secondary indexes must have enough write capacity to accommodate the write; otherwise, the write to the table will be throttled.

Next, I will fire up Visual Studio and create new AWS console application project:

image

In order to illustrate how easy it is to put and get data from table I created a very simple code snippet. First I created very simplistic class Customer, representation of some simple mythical company customer where we track unique id, name, city and state

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace DynamoDBTest
{
    class Customer
    {
        private string _CustomerID;
        private string _CompanyName;
        private string _City;
        private String _State;
    public string CustomerID
        {
            get { return _CustomerID; }
            set { _CustomerID = value; }

        }
     public string CompanyName
        {
            get { return _CompanyName; }
            set { _CompanyName = value; }

        }

      public string City
        {
            get { return _City; }
            set { _City = value; }

        }
        public string State
        {
            get { return _State; }
            set { _State = value; }

        }
        public Customer(string CustomerID, string CompanyName,string City, string State)
        {
            this.CustomerID = CustomerID;
            this.CompanyName = CompanyName;
            this.City = City;
            this.State = State;
        }
        public Customer()
        {

        }
    }
}

Once that is done, I can add and get this data from DynamoDB. In my sample I used both low level and document programming models for interest:

using System;
using System.Collections.Generic;
using System.Configuration;
using System.IO;
using System.Linq;
using System.Text;

using Amazon;
using Amazon.EC2;
using Amazon.EC2.Model;
using Amazon.SimpleDB;
using Amazon.SimpleDB.Model;
using Amazon.S3;
using Amazon.S3.Model;
using Amazon.DynamoDBv2;
using Amazon.DynamoDBv2.Model;
using Amazon.DynamoDBv2.DocumentModel;

namespace DynamoDBTest
{
    class Program
    {
        public static void Main(string[] args)
        {
            //lets add some sample data
            AddCustomer("1","SampleCo1", "Seattle", "WA");
            AddCustomer("2","SampleCo2", "Reston", "VA");
            AddCustomer("3","SampleCo3", "Minneapolis", "MN");
            Console.WriteLine("Added sample data");
            Customer myCustomer = new Customer();
            myCustomer = GetCustomerByID("1");
            Console.WriteLine("Retrieved Sample Data..." + myCustomer.CustomerID + " " + myCustomer.CompanyName + " " + myCustomer.City + " " + myCustomer.State + " ");
            myCustomer = GetCustomerByID("2");
            Console.WriteLine("Retrieved Sample Data..." + myCustomer.CustomerID + " " + myCustomer.CompanyName + " " + myCustomer.City + " " + myCustomer.State + " ");
            myCustomer = GetCustomerByID("3");
            Console.WriteLine("Retrieved Sample Data..." + myCustomer.CustomerID + " " + myCustomer.CompanyName + " " + myCustomer.City + " " + myCustomer.State + " ");
            Console.Read();

      
        }

        public static void AddCustomer(string CustomerID,string CompanyName,string City, string State)
        {
        
            Customer myCustomer = new Customer(CustomerID, CompanyName, City, State);
            var client = new AmazonDynamoDBClient();
            
            var myrequest = new PutItemRequest();
            {
                myrequest.TableName= "Customer";



                myrequest.Item = new Dictionary
            {
                {"CustomerID",new AttributeValue {S = myCustomer.CustomerID} },
                {"CompanyName", new AttributeValue {S =myCustomer.CompanyName } },
                {"City", new AttributeValue {S= myCustomer.City } },
                {"State", new AttributeValue {S=myCustomer.State } }
            };
            }

            client.PutItem(myrequest);

        }
        public static Customer GetCustomerByID(string CustomerID)
        {
            var client =new AmazonDynamoDBClient();
            var table = Table.LoadTable(client, "Customer");
            var item = table.GetItem(CustomerID);

            Customer myCustomer = new Customer(item["CustomerID"].ToString(), item["CompanyName"].ToString(), item["City"].ToString(), item["State"].ToString());

            return myCustomer;
            
        }
    }
}

Result when I run the application are pretty simple , however you can see how easy it is to put an item and get an item, And its fast, really fast…

image

From VS AWS Explorer I can confirm rows in my table:

image

Hope this was useful. For more information see – http://blogs.aws.amazon.com/net/post/Tx17SQHVEMW8MXC/DynamoDB-APIs, , http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForLSI.html, http://yourstory.com/2012/02/step-by-step-guide-to-amazon-dynamodb-for-net-developers/,http://blog.grio.com/2012/03/getting-started-with-amazon-dynamodb.html

Forecast Cloudy – Getting Started with AWS using Microsoft.NET

aws_logo

Switching off my usual Azure and Google Cloud hang outs I decided to check out AWS. Moreover as AWS offers a nice ,NET SDK and toolkit for Visual Studio,  I decided to start with .NET as technology for AWS.

First thing you need to do if you decided to develop with .NET for AWS is to create AWS account. To sign up for AWS account:

  • Go to http://aws.amazon.com/  and click Sign Up.
  • Follow the on-screen instructions
  • As part of sign up they will call you and you will enter PIN using phone keypad

AWS sends you a confirmation email after the sign-up process is complete. At any time, you can view your current account activity and manage your account by going to http://aws.amazon.com and clicking My Account/Console. To use the AWS SDK for .NET, you must have a set of valid AWS credentials, which consist of an access key and a secret key. These keys are used to sign programmatic web service requests and enable AWS to verify the request comes from an authorized source. You can obtain a set of account credentials when you create your account.However, AWS docs recommend that you do not use these credentials with AWS SDK for .NET. Instead, they want you to create one or more IAM users, and use those credentials. For applications that run on EC2 instances, you can use IAM roles to provide temporary credentials.

To use the AWS SDK for .NET, you must have the following installed:

  • Microsoft .NET Framework 3.5 or later
  • Visual Studio 2010 or later
  • AWS SDK for .NET
  • AWS Toolkit for Visual Studio, plugin that provides a user interface for managing your AWS resources from Visual Studio, and includes the AWS SDK for .NET

You can install AWS SDK for .NET from NuGet. NuGet always has the most recent versions of the AWSSDK assemblies, and also enables you to install previous versions. NuGet is aware of dependencies between assemblies and installs all required assemblies automatically. Assemblies installed with NuGet are stored with your solution rather than in a central location, such as Program Files. This enables you to install assembly versions specific to a given application without creating compatibility issues for other applications.

To use NuGet from Visual Studio Solution Explorer, right-click on your project and choose Manage NuGet Packages from the context menu.

image

From the Package Manager Console, you can install the desired AWSSDK assemblies using the Install-Package command. For example, to install the AWSSDK.AutoScaling assembly, use the following command:

PM> Install-Package AWSSDK.AutoScaling -Version 3.0.0.1

NuGet will also install any dependencies like AWSCore.

Now lets start a new project in Visual Studio. Since I have installed AWS VS Toolkit things are really simple here:

image

Toolkit provides some basic templates:

  • AWS Console Project –A  console application that makes basic requests to Amazon S3, Amazon SimpleDB, and Amazon EC2. Great for my learning.
  • AWS Empty Project – A console application that does not include any code.
  • AWS Web Project – An ASP.NET application that makes basic requests to Amazon S3, Amazon SimpleDB, and Amazon EC2.

Actually I can also use basic VS built in templates and just add AWSSDK via NuGet as I show above as well. But the toolkit is really nice and polished.

Before I start on anything in VS I really would like to see my AWS items in VS AWS Explorer. To do so do following:

  • In Visual Studio, open AWS Explorer by clicking the View menu and choosing AWS Explorer. You can also open AWS Explorer by typing Ctrl+K, and then pressing the A key.
  • Click the New Account Profile icon to the right of the Profile list.
  • Enter your profile name and access keys into the dialog. As stated previously I actually went to AWS IAM (Identity and Access Management) and setup special user so I don’t use root keys:

image

      Now I can see my items in VS AWS Explorer:

      image

For the sake of simplicity I will pick a AWS Console Application here.

image

Once I pick my console application I am presented with dialog like this one:

Credentials Dialog Box

All it really does is allows me to choose profile and writes keys into App.config appsettings section.The sample SDK Code AWS provides just lists your AWS instances to the command window:

using System;
using System.Collections.Generic;
using System.Configuration;
using System.IO;
using System.Linq;
using System.Text;

using Amazon;
using Amazon.EC2;
using Amazon.EC2.Model;
using Amazon.SimpleDB;
using Amazon.SimpleDB.Model;
using Amazon.S3;
using Amazon.S3.Model;

namespace AWSConsoleFirst
{
    class Program
    {
        public static void Main(string[] args)
        {
            Console.Write(GetServiceOutput());
            Console.Read();
        }

        public static string GetServiceOutput()
        {
            StringBuilder sb = new StringBuilder(1024);
            using (StringWriter sr = new StringWriter(sb))
            {
                sr.WriteLine("===========================================");
                sr.WriteLine("Welcome to the AWS .NET SDK!");
                sr.WriteLine("===========================================");

                // Print the number of Amazon EC2 instances.
                IAmazonEC2 ec2 = new AmazonEC2Client();
                DescribeInstancesRequest ec2Request = new DescribeInstancesRequest();

                try
                {
                    DescribeInstancesResponse ec2Response = ec2.DescribeInstances(ec2Request);
                    int numInstances = 0;
                    numInstances = ec2Response.Reservations.Count;
                    sr.WriteLine(string.Format("You have {0} Amazon EC2 instance(s) running in the {1} region.",
                                               numInstances, ConfigurationManager.AppSettings["AWSRegion"]));
                }
                catch (AmazonEC2Exception ex)
                {
                    if (ex.ErrorCode != null && ex.ErrorCode.Equals("AuthFailure"))
                    {
                        sr.WriteLine("The account you are using is not signed up for Amazon EC2.");
                        sr.WriteLine("You can sign up for Amazon EC2 at http://aws.amazon.com/ec2");
                    }
                    else
                    {
                        sr.WriteLine("Caught Exception: " + ex.Message);
                        sr.WriteLine("Response Status Code: " + ex.StatusCode);
                        sr.WriteLine("Error Code: " + ex.ErrorCode);
                        sr.WriteLine("Error Type: " + ex.ErrorType);
                        sr.WriteLine("Request ID: " + ex.RequestId);
                    }
                }
                sr.WriteLine();

                // Print the number of Amazon SimpleDB domains.
                IAmazonSimpleDB sdb = new AmazonSimpleDBClient();
                ListDomainsRequest sdbRequest = new ListDomainsRequest();

                try
                {
                    ListDomainsResponse sdbResponse = sdb.ListDomains(sdbRequest);

                    int numDomains = 0;
                    numDomains = sdbResponse.DomainNames.Count;
                    sr.WriteLine(string.Format("You have {0} Amazon SimpleDB domain(s) in the {1} region.",
                                               numDomains, ConfigurationManager.AppSettings["AWSRegion"]));
                }
                catch (AmazonSimpleDBException ex)
                {
                    if (ex.ErrorCode != null && ex.ErrorCode.Equals("AuthFailure"))
                    {
                        sr.WriteLine("The account you are using is not signed up for Amazon SimpleDB.");
                        sr.WriteLine("You can sign up for Amazon SimpleDB at http://aws.amazon.com/simpledb");
                    }
                    else
                    {
                        sr.WriteLine("Caught Exception: " + ex.Message);
                        sr.WriteLine("Response Status Code: " + ex.StatusCode);
                        sr.WriteLine("Error Code: " + ex.ErrorCode);
                        sr.WriteLine("Error Type: " + ex.ErrorType);
                        sr.WriteLine("Request ID: " + ex.RequestId);
                    }
                }
                sr.WriteLine();

                // Print the number of Amazon S3 Buckets.
                IAmazonS3 s3Client = new AmazonS3Client();

                try
                {
                    ListBucketsResponse response = s3Client.ListBuckets();
                    int numBuckets = 0;
                    if (response.Buckets != null &&
                        response.Buckets.Count > 0)
                    {
                        numBuckets = response.Buckets.Count;
                    }
                    sr.WriteLine("You have " + numBuckets + " Amazon S3 bucket(s).");
                }
                catch (AmazonS3Exception ex)
                {
                    if (ex.ErrorCode != null && (ex.ErrorCode.Equals("InvalidAccessKeyId") ||
                        ex.ErrorCode.Equals("InvalidSecurity")))
                    {
                        sr.WriteLine("Please check the provided AWS Credentials.");
                        sr.WriteLine("If you haven't signed up for Amazon S3, please visit http://aws.amazon.com/s3");
                    }
                    else
                    {
                        sr.WriteLine("Caught Exception: " + ex.Message);
                        sr.WriteLine("Response Status Code: " + ex.StatusCode);
                        sr.WriteLine("Error Code: " + ex.ErrorCode);
                        sr.WriteLine("Request ID: " + ex.RequestId);
                    }
                }
                sr.WriteLine("Press any key to continue...");
            }
            return sb.ToString();
        }
    }
}

Result is quite humbling, but to me it beats usual Hello World statement:

image

My next job is to build something more useful, but now that I am setup its just a matter of finding a bit of time to spare.

For more information see – http://docs.aws.amazon.com/AWSSdkDocsNET/latest/V3/DeveloperGuide/net-dg-start-new-project.html, http://aws.amazon.com/net/, http://docs.aws.amazon.com/AWSSdkDocsNET/latest/V3/DeveloperGuide/net-dg-programming-techniques.html