Alienus Terrae- Running Simple Kubernetes Cluster on Google Cloud

google-cloud-platform_w_500

Did you know that you can create a basic managed Kubernetes Cluster (GKE) on GCP in code in less than 20 minutes?

You will of course new few prerequisites:

  • GCP account
  • Google Cloud SDK installed. The Cloud SDK is a set of command-line tools for developing with Google Cloud. You can use these tools to access Compute Engine, Cloud Storage, BigQuery, and other Google Cloud services directly from the command line. The download and install directions can be fount at https://cloud.google.com/sdk/install
  • Decide what kind of cluster you want to create. I will create a very simple single zone cluster in my example, but you may want to go more sophisticated with multi-zonal or regional cluster. For more details see – https://cloud.google.com/kubernetes-engine/docs/concepts/types-of-clusters
  • Enable GCP Kubernetes Engine API.

Once you are done with prerequisites, next start up you GCP Cloud SDK command line.  Then you need to initialize your SDK with this command:

 gcloud init

This does some checks and prompts you if you wish to login to GCP. If you proceed , it will open browser window to authenticate you, following that your browser is redirected to screen like this:

gcp2

Next, quickly setup your default project using Project Id, I am using my id, yours will be different:

gcloud config set project gennadyk1-221817

Now you should get response like this acknowledging you set default project:

gcp3

Now update GCP SDK components to latest version:

gcloud components update

You will see some scrolling command line magic , agree to continue after a bit will get confirmation that you are updated.

Next I will create cluster using gcloud beta command like:

gcloud beta container clusters create gennadyk

Now navigating to GCP Console Kubernetes Engine one can see cluster created:

gcp_cluster

You can see confirmation in SDK command line as well:

gcp4

So as you can see I created a GCP GKE Kubernetes cluster using one line command in 5 minutes. Pretty cool isnt it?

For more on GCP SDK see – https://medium.com/mindorks/getting-started-with-google-cloud-sdk-40e806c07460, https://cloud.google.com/sdk/docs/

Hope this helps.

 

Steering in the Clouds – Running SQL Server on Azure Kubernetes Managed Service (AKS)

logo_kubernetes_400

In this post I would love to combine my favorite RDBMS , which is SQL Server with hottest Azure service out there – AKS. AKS is Azure hosted Kubernetes based container orchestration service. AKS reduces the complexity and operational overhead of managing Kubernetes by offloading much of that responsibility to Azure. As a hosted Kubernetes service, Azure handles critical tasks like health monitoring and maintenance for you.

Why run SQL Server in containers orchestrated by Kubernetes?

Over the past year, Kubernetes — also known as K8s — has become a dominant topic of conversation in the infrastructure world. Given its pedigree of literally working at Google-scale, it makes sense that people want to bring that kind of power to their DevOps stories; container orchestration turns many tedious and complex tasks into something as simple as a declarative config file.

The rise of orchestration is predicated on a few things, though. First, organizations have moved toward breaking up monolithic applications into microservices. However, the resulting environments have hundreds (or thousands) of these services that need to be managed. Second, infrastructure has become cheap and disposable — if a machine fails, it’s dramatically cheaper to replace it than triage the problems.

So, to solve the first issue, orchestration relies on the boon of the second; it manages services by simply letting new machines, running the exact same containers, take the place of failed ones, which keeps a service running without any manual interference.

However, up until recently K8s were only used by stateless aps and couldn’t work with stateful applications and databases. Managing state in Kubernetes is difficult because the system’s dynamism is too chaotic for most databases to handle–especially SQL databases that offer strong consistency.

Only since StateFull sets were introduced in K8s we can start thinking about running databases in K8s. By far the most common way to run a database, StatefulSets is a feature fully supported as of the Kubernetes 1.9 release. Using it, each of your pods is guaranteed the same network identity and disk across restarts, even if it’s rescheduled to a different physical machine. In AKS in order to run an RDBMS like SQL Server I will need to create persisted volume.

A Persisted Volume (PV) is a piece of storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV.   This will allow our pods to go up and down and our volume to stay available.

wkb_arch

So all you need to setup SQL Server in Azure Kubernetes PaaS service (AKS) is really an Azure account.  FIrst thing lets install Azure CLI. I am using Windows machine, but recently I was able to install and run Azure CLI on mac as well , of course.  Instructions for Windows install can be found here –https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-windows?view=azure-cli-latest.

If you dont want to install Azure CLI on your machine you can simply use it with Azure Cloud Shell on the Azure Portal – https://docs.microsoft.com/en-us/azure/cloud-shell/overview . Recently I used GCP Cloud Shell and this , as well as Azure Cloud Shell are great idea if you dont want to or cannot install CLI\SDK on your machine.

Once you installed CLI, lets login to your Azure subscription.

az login

This will launch your default browser for you to login.  Once you logged in , you can list subscriptions you have via this command

az account list

Below picture shows output of above command , I erased all of the unique information

azaccount

Once the login process is complete and you are operating in the right subscription,  the resource group to hold all the objects of the Kubernetes cluster can be created:

az group create --name GennadyKSQLK8s--location eastus

You should be able to see new resource group in Azure Portal UI:

rg

Once resource group is created we can build AKS cluster in that resource group.

az aks create --resource-group GennadyKSQLK8s --name mySQLK8sCluster --node-count 2 --generate-ssh-keys

There is a lot going behslightly ind the scenes of this one command so you may have to be a bit more patient, something I am not known for:

k8run

In few minutes we are done and cluster is created.  Next lets install kubectl to manage Kubernetes cluster.

az aks install-cli

Setup PATH for kubectl:

set PATH=%PATH%;C:\Users\Gennady\.azure-kubectl

Next, lets grab credentials in order to connect and manage the cluster: –

az aks get-credentials --resource-group=GennadyKSQLK8s  --name=mySQLK8sCluster

The cluster is setup and we can connect. Lets test it by listing cluster nodes:

kubectl get nodes

The output is below:

kubectlnodes

There are the two nodes of the cluster and it is ready to host a sql instance in a pod.  Now we need to create new persisted volume and instance of SQL Server on Linux.

We will begin by creating SA password for SQL Server in Kubernetes cluster. Kubernetes can manage configuration information like passwords as secrets.

kubectl create secret generic mssql --from-literal=SA_PASSWORD="MyP@ssw0rd1"

Now we need persistent volume and persistent volume claim in Kubernetes. Here are the steps to get that created:

      • Create a manifest to define the storage class and the persistent volume claim. The manifest specifies the storage provisioner, parameters, and reclaim policy. The Kubernetes cluster uses this manifest to create the persistent storage. I will use following YAML to define my storage class and persistent volume claim:
        kind: StorageClass
        apiVersion: storage.k8s.io/v1beta1
        metadata:
             name: azure-disk
        provisioner: kubernetes.io/azure-disk
        parameters:
          storageaccounttype: Standard_LRS
          kind: Managed
        ---
        kind: PersistentVolumeClaim
        apiVersion: v1
        metadata:
          name: mssql-data
          annotations:
            volume.beta.kubernetes.io/storage-class: azure-disk
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 8Gi
        

        I will save above YAML as disk.yaml on my computer.

      • Next , we create the persistent volume claim in Kubernetes. Lets run this command:
        kubectl apply -f C:\Users\Gennady\Downloads\disk.yaml

        Obviously, your yaml file location may be different than mine above
        Result shown below:

        kubectlpersistent
        The persistent volume is automatically created as an Azure storage account, and bound to the persistent volume claim

      • Verify the persistent volume claim.  We can run following command:
        kubectl describe pvc mssql-data

        The output can be seen below:

        pvc_describe

Now you should be able to see this volume in the portal as part of our resource group:

pvc1

Now, that the persistent volume is added we need to add the SQL Server deployment. The deployment creates a replica set. The replica set creates the pod.  The deployment described via YAML manifest like below:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: mssql-deployment
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: mssql
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: mssql
        image: microsoft/mssql-server-linux
        ports:
        - containerPort: 1433
        env:
        - name: ACCEPT_EULA
          value: "Y"
        - name: SA_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mssql
              key: SA_PASSWORD 
        volumeMounts:
        - name: mssqldb
          mountPath: /var/opt/mssql
      volumes:
      - name: mssqldb
        persistentVolumeClaim:
          claimName: mssql-data
---
apiVersion: v1
kind: Service
metadata:
  name: mssql-deployment
spec:
  selector:
    app: mssql
  ports:
    - protocol: TCP
      port: 1433
      targetPort: 1433
  type: LoadBalancer

I will save above YAML to my computer as deployment.yaml file. Now I can run this to deploy:

kubectl apply -f C:\Users\Gennady\Downloads\deployment.yaml

You should see following output:

KUBECTLDEPLOY

The deployment and service are created. The SQL Server instance is in a container, connected to persistent storage. Now lets connect to the pod to verify pod was created via kubectl get pod

getpod

Now lets verify that services are running via kubectl get services.

getservices

This command returns services that are running, as well as the internal and external IP addresses for the services.

Finaly , lets connect to our SQL Server instance using sqlcmd:

lastone

Hope this helps and happy steering the cloud databases. For more see – https://kubernetes.io/docs/home/?path=browsehttps://docs.microsoft.com/en-us/sql/linux/tutorial-sql-server-containers-kubernetes?view=sql-server-2017https://joeydantoni.com/2018/03/29/getting-started-with-sql-server-and-kubernetes/https://cloudblogs.microsoft.com/sqlserver/2018/05/29/sql-server-at-the-red-hat-summit-2018/.

 

Data Cargo In The Clouds – Running SQL Server 2017 in Docker

sqlserver_and_docker

In my previous post I attempted to explain my new interest in Docker and running data-centric applications in container. Starting here I will show how to install SQL Server for Linux on Azure IaaS in container.

Lets start by creating Linux VM in Azure that will run Docker for us.  I dont want to spend all of the time and space going through steps in portal or PowerShell since I already went through these in this post, Good video titorial  can also be found here .

Assuming you successfully created Linux VM and can login like this:

azlinux1

Next thing we will install Docker on that VM.  We will get latest and greatest version of Docker from Docker repository.

  • First, add the GPG key for the official Docker repository to the system:

    $ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
    
  • Add the Docker repository to APT sources:

    $ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
  • Next, update the package database with the Docker packages from the newly added repo:

    $ sudo apt-get update
  • Make sure you are about to install from the Docker repo instead of the default Ubuntu 16.04 repo:
    apt-cache policy docker-ce

    You should see following output:

    docker-ce:
      Installed: (none)
      Candidate: 17.03.1~ce-0~ubuntu-xenial
      Version table:
         17.03.1~ce-0~ubuntu-xenial 500
            500 https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
         17.03.0~ce-0~ubuntu-xenial 500
            500 https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
    
    

    Notice that docker-ce is not installed, but the candidate for installation is from the Docker repository for Ubuntu 16.04. The docker-ce version number might be different.

  • Finally install Docker
    $ sudo apt-get install -y docker-ce
  • Make sure its installed and running
    $ sudo systemctl status docker

    Output should be similar to below showing that daemon is started and running

    linuxdoc

Next we need to pull latest SQL Server Docker image:

$ sudo docker pull microsoft/mssql-server-linux

Once the image is pulled and extracted you should see similar output

linuxdoc2

Now lets run container image with Docker:

$ sudo docker run -e 'ACCEPT_EULA=Y' -e 'MSSQL_SA_PASSWORD=<YourStrong!Passw0rd>' -e 'MSSQL_PID=Developer' --cap-add SYS_PTRACE -p 1401:1433 -d microsoft/mssql-server-linux

Now we are running Docker container with SQL Server 2017 in Azure. We should be able to list our containers like:

 $ sudo docker ps -a

And see output like:

linuxdoc3

Connect to SQL Server in the container.

The following steps use the SQL Server command-line tool, sqlcmd, inside the container to connect to SQL Server. First lets connect to bash inside the container using docker exec command. Note I am providing container id here as parameter fetched from previous output of ps command

$ sudo docker exec -it d95734b7f9ba  "bash"

Now lets connect via sqlcmd here

/opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P '<YourStrong!Passw0rd>'

Now lets make simplest query:

SELECT @@version
     GO

You should see output like:

linuxdoc4

Now you can go ahead and start creating databases, tables, moving data, etc.

For more see – https://docs.microsoft.com/en-us/sql/linux/quickstart-install-connect-dockerhttps://docs.microsoft.com/en-us/sql/linux/sql-server-linux-configure-docker. SQL Server images on Docker Hub – https://hub.docker.com/r/microsoft/mssql-server-linux/https://mathaywardhill.com/2017/05/08/how-to-attach-a-sql-server-database-in-a-linux-docker-container/

 

Hope this helps.

Data Cargo In The Clouds – Data and Containerization On The Cloud Platforms

container

This topic is where I actually planned to pivot to for quite a while, but either had no bandwith\time with moving to Seattle or some other excuse like that. This topic is an interesting to me as its intersection of technologies where I used to spend quite a bit of time including:

  • Data, including both traditional RDBMS databases and NoSQL designs
  • Cloud, including Microsoft Azure, AWS and GCP
  • Containers and container orchestration (that is area new to me)

Those who worked with me either displayed various degrees of agreement, but more often of annoyance, on me pushing this topic as growing part of data engineering discipline, but I truly believe in it. But before we start combining all these areas and do something useful in code I will spend some time here with most basic concepts of what I am about to enter here.

I will skip introduction to basic concepts of RDBMS, NoSQL or BigData\Hadoop technologies, as if you didn’t hide under the rock for last five years, you should be quite aware of those. That brings us straight to containers:

As definition states – “A container image is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings. Available for both Linux and Windows based apps, containerized software will always run the same, regardless of the environment. Containers isolate software from its surroundings, for example differences between development and staging environments and help reduce conflicts between teams running different software on the same infrastructure.”

Containers are a solution to the problem of how to get software to run reliably when moved from one computing environment to another. This could be from a developer’s laptop to a test environment, from a staging environment into production, and perhaps from a physical machine in a data center to a virtual machine in a private or public cloud.

container.jpg

But why containers if I already have VMs? 

VMs take up a lot of system resources. Each VM runs not just a full copy of an operating system, but a virtual copy of all the hardware that the operating system needs to run. This quickly adds up to a lot of RAM and CPU cycles. In contrast, all that a container requires is enough of an operating system, supporting programs and libraries, and system resources to run a specific program.
What this means in practice is you can put two to three times as many as applications on a single server with containers than you can with a VM.  In addition, with containers you can create a portable, consistent operating environment for development, testing, and deployment. 

On Linux, containers run on top of LXC. This is a userspace interface for the Linux kernel containment features. It includes an application programming interface (API) to enable Linux users to create and manage system or application containers.

Docker is an open platform tool to make it easier to create, deploy and to execute the applications by using containers.

  • The heart of Docker is Docker engine. The Docker engine is a part of Docker which create and run the Docker containers. The docker container is a live running instance of a docker image. Docker Engine is a client-server based application with following components :
    • A server which is a continuously running service called a daemon process.
    • A REST API which interfaces the programs to use talk with the daemon and give instruct it what to do.
    • A command line interface client.

             docker_engine 

  • The command line interface client uses the Docker REST API to interact with the        Docker daemon through using CLI commands. Many other Docker applications    also use the API and CLI. The daemon process creates and manage Docker images, containers, networks, and volumes.
  • Docker client is the primary service using which Docker users communicate with the Docker. When we use commands “docker run” the client sends these commands to dockerd, which execute them out.

You will build Docker images using Docker and deploy these into what are known as Docker registries. When we run the docker pull and docker run commands, the required images are pulled from our configured registry directory.Using Docker push command, the image can be uploaded to our configured registry directory. 

Finally deploying instances of images from your registry you will deploy containers. We can create, run, stop, or delete a container using the Docker CLI. We can connect a container to more than one networks, or even create a new image based on its current state.By default, a container is well isolated from other containers and its system machine. A container defined by its image or configuration options that we provide during to create or run it.

For more info on containers , LXC and Docker see – https://blog.scottlowe.org/2013/11/25/a-brief-introduction-to-linux-containers-with-lxc/, https://www.docker.com/what-container#/virtual_machines http://searchservervirtualization.techtarget.com/definition/container-based-virtualization-operating-system-level-virtualization

That brings us to container orchestration engines. While the CLI meets the needs of managing one container on one host, it falls short when it comes to managing multiple containers deployed on multiple hosts. To go beyond the management of individual containers, we must turn to orchestration tools. Orchestration tools extend lifecycle management capabilities to complex, multi-container workloads deployed on a cluster of machines.

Some of the well known orchestration engines include:

Kubernetes.  Almost a standard nowdays , originally developed at Google. Kubernetes’ architecture is based on a master server with multiple minions. The command line tool, called kubecfg, connects to the API endpoint of the master to manage and orchestrate the minions.  The service definition, along with the rules and constraints, is described in a JSON file. For service discovery, Kubernetes provides a stable IP address and DNS name that corresponds to a dynamic set of pods. When a container running in a Kubernetes pod connects to this address, the connection is forwarded by a local agent (called the kube-proxy) running on the source machine to one of the corresponding backend containers.Kubernetes supports user-implemented application health checks. These checks are performed by the kubelet running on each minion to ensure that the application is operating correctly.

kub

For more see – https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/#kubernetes-is

Apache Mesos. This is an open source cluster manager that simplifies the complexity of running tasks on a shared pool of servers. A typical Mesos cluster consists of one or more servers running the mesos-master and a cluster of servers running the mesos-slave component. Each slave is registered with the master to offer resources. The master interacts with deployed frameworks to delegate tasks to slaves. Unlike other tools, Mesos ensures high availability of the master nodes using Apache ZooKeeper, which replicates the masters to form a quorum. A high availability deployment requires at least three master nodes. All nodes in the system, including masters and slaves, communicate with ZooKeeper to determine which master is the current leading master. The leader performs health checks on all the slaves and proactively deactivates any that fail.

 

Over the next couple of posts I will create containers running SQL Server and other data stores, both RDBMS and NoSQL, deploy these on the cloud and finally attempt to orchestrate these hopefully as well.  So lets move off theory and pictures into world of parctical data engine deployments.

Hope you will find this detour interesting.